<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Subspace clustering with gravitation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiwu Zhao</string-name>
          <email>zhao@cs.uni-duesseldorf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Heinrich-Heine University Institute of Computer Science Databases and Information Systems Universitaetsstr.</institution>
          <addr-line>1 D-40225 Duesseldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <abstract>
        <p>Data mining is a process of discovering and exploiting hidden patterns from data. Clustering as an important task of data mining divides the observations into groups (clusters), which is according to the principle that the observations in the same cluster are similar, and the ones from di erent clusters are dissimilar to each other. Subspace clustering enables clustering in subspaces within a data set, which means the clusters could be found not only in the whole space but also in subspaces. The well-known subspace clustering methods have a common problem, the parameters are hard to be decided. To face this issue, a new subspace clustering method based on Bottom-Up method is introduced in this article. It takes a gravitation function to select data and dimensions by using self-comparison technique. The parameter decision is easy, and does not depend on amount of the data, which makes the subspace clustering more practical.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;data mining</kwd>
        <kwd>subspace clustering</kwd>
        <kwd>gravitation</kwd>
        <kwd>high dimensional data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>DENCLUE [9] is a density-based clustering algorithm by
using Gaussian kernel function as its abstract density function
and hill climbing method to nd cluster centers. DENCLUE
2.0 [8] is an improvement on DENCLUE. The algorithms
di er from other density-based approaches in that they
calculate density to each data point instead of an area in the
attribute space. DENCLUE has not to estimate the
number or the position of clusters, because clustering is based on
the density information of each point. However it is still
necessary to estimate the parameters in these two algorithms,
such as , in DENCLUE and , p in DENCLUE 2.0.
Besides, they are not designed for subspace clustering.
Applying the Newton's universal law of gravitation in
clustering is not a novel idea. A gravitational clustering
algorithm [7] simulates the movement of objects by applying
the gravitational force, and detects clusters from merged
objects. A shrinking-based approach [14] inspired by
gravitation is a grid-based clustering method, which shrinks the
objects in a grid cell towards the data centroid and nds
the clusters. However, for each algorithm we have to nd
appropriate values for its parameters.</p>
    </sec>
    <sec id="sec-2">
      <title>1.2 Contributions of the paper</title>
      <p>In this paper, we introduce a new density-based bottom-up
subspace clustering method called SUGRA (SUbspace
clustering method by using GRAvitation's function). The basic
idea is similar to DENCLUE, instead of using the Gaussian
kernel function, we use gravitation's function with scaled
distance to represent the density function, but the objects
don not have to move towards the centroid. From this
simple idea we have found an interesting property that in one
dimensional subspace almost all cluster objects and
noncluster objects (noise) are separated by a constant. With
this property, we can detect clusters very distinctly,
meanwhile, SUGRA realizes the reduction of parameters by
subspace clustering.</p>
      <p>The remainder of this paper is organized as follows. The
idea of SUGRA is introduced in section 2, where section 2.1
and 2.2 are de nitions about cluster and gravitation function
respectively, section 2.3 presents the algorithm of SUGRA.
The last section contains conclusions and areas of future
work.</p>
    </sec>
    <sec id="sec-3">
      <title>2. SUBSPACE CLUSTERING WITH GRAV</title>
    </sec>
    <sec id="sec-4">
      <title>ITATION</title>
    </sec>
    <sec id="sec-5">
      <title>2.1 Definition of data set and subspace cluster</title>
      <p>A data set consists of objects and their attributes. Usually,
all objects have common attributes in a data set, such as
color, price, length etc., and every object has a value for an
attribute. The values that are related to an attribute are
in the same domain and conform to the same constraints.
A data set could be described as a table, where the objects
are just rows, meanwhile the attributes are columns. The
attributes could also be considered as dimensions, so that
each attribute represents one dimension, and then the
objects are points in these dimensions.</p>
      <p>In order to describe the attributes and objects clearly, they
are de ned as follows:</p>
      <p>De nition 1. (Data set) Generally, a data set D could be
considered as a pair, which is the combination of A and O:</p>
      <p>D = (A; O)
where A is a set of all attributes (dimensions), and O is a
set of all objects:</p>
      <p>A = fa1; a2;
; ai;
g;</p>
      <p>
        O = fo1; o2;
; op;
g (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
where op is an object with values on A:
We denote the values of all objects on attribute ai with:
op = fopa1 ; opa2 ;
oai = fo1ai ; o2ai ;
; opai ;
; opai ;
g
g
      </p>
      <p>De nition 2. (Subspace cluster) A subspace cluster S is
also a data set and de ned as follows:</p>
      <p>
        S = De = (Ae; Oe)
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
where Ae A and Oe O, and S must satisfy a particular
condition C, which is de ned di erently in every subspace
clustering algorithm.
      </p>
      <p>
        A subspace cluster S could then be written like this:
S = (Ae; Oe) = (fa1; a12; a60;
g; fo1; o5; o30;
g)
The cardinality regarding the objects and the dimensions in
S are de ned respectively:
jSjO = jOej; jSjA = jAej
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
      </p>
      <p>Remark 1. Suppose S1; S2 are two subspace clusters, where
S1 = (A1; O1) and S2 = (A2; O2), then</p>
      <p>If A1 6= A2 _ O1 6= O2 =) S1 6= S2, the subspace
clusters with di erent dimensions or objects are considered
as di erent ones.
8A0</p>
      <p>A1, S0 = (A0; O1) is also a subspace cluster.</p>
      <p>
        If A1 A2 ^ O1 = O2 or A1 = A2 ^ O1 O2 =)
S1 &lt; S2. Only the largest subspace cluster will be taken
in the clustering result.
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
      </p>
      <p>De nition 3. The intersection of two subspace clusters is
de ned as follows:</p>
      <p>
        S1 \ S2 = (A1 [ A2; O1 \ O2)
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
      </p>
      <p>De nition 4. Sai is the set of all subspace clusters found
in dimension ai, SD is the set of all subspace clusters found
in D, nding SD is the task of subspace clustering.</p>
    </sec>
    <sec id="sec-6">
      <title>2.2 Gravitation</title>
      <p>Gravitation is a natural phenomenon, which describes the
force of attraction between objects with mass. The
gravitation is important, because it in uences our normal lives.
The Newton's law of universal gravitation is de ned as
follows:</p>
      <p>G = G
m1m2
r2
where G is the gravity between two point masses, G is the
gravitational constant, m1 and m2 are the masses of two
points respectively, r is the distance between the two point
masses.</p>
      <p>SUGRA tries to use the gravitation's function for the
measurement between the objects. In order to make the
calculation easier, a simple gravity function is used here:
De nition 5. (Simple gravity function)</p>
      <p>Gpaqi = mpmq
rp2q
Suppose that a single object op has a mass mp = 1, rpq is the
distance between op and oq is de ned as rpq = lpq ,
L=(N 1)
where L = max(oai ) min(oai ) is the length of this
dimension and N = jOj is the number of the objects. rpq indicates
a proportion of the real length lpq to the average interval
L=(N 1), so that</p>
      <p>Remark 2. If rpq = 0, op and oq stand at a same place
in ai. In order to let Gpaqi calculable and to get a logical
result, we should set rpq greater than 0 but smaller than any
other distances. An idea is setting rpq to a half of the
minimum distance in ai. For example, om, on has the minimum
distance rmn &gt; 0 in ai, then setting rpq = rmn=2 make sure
that Gpaqi &gt; Gamin, which is expected.</p>
      <p>Remark 3. The rpq is such de ned as a proportion
distance but not a real distance because that it enables the data
with di erent ranges of values to be calculated into a same
range. For example, the attributes age and salary are
obviously in two ranges, but by using such a proportion distance
the two attributes could be calculated and compared together.
There are further de nitions, which are important for the
SUGRA algorithm.</p>
      <p>De nition 6. (Gravitation of an object) The gravitation
of an object op in dimension ai is de ned as the sum of
gravitation from op with other objects.</p>
      <p>Gpai =</p>
      <p>X
8q; q6=p</p>
      <p>
        Gpaqi
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
An object that lies near to others (cluster objects) has a
greater gravitation than that of objects far from others
(non-cluster objects) (see Figure 1 (b)).
      </p>
      <p>De nition 7. (Average gravitation) The average
gravitation Gai of dimension ai is the gravitation of the middle
object of uniformly distributed objects in ai. Gai is
presented with a dotted line in Figure 1.</p>
      <p>Suppose om is the object in the middle. Gai could be
calculated as follows:
Gai</p>
      <p>X
8p; p6=m lm2p(N
1)2</p>
      <p>=</p>
      <p>L2
0
1</p>
      <p>1
1 n N2 ( L</p>
      <p>N 1</p>
      <p>L2
(N</p>
      <p>1)2
0</p>
      <p>X
2 B C
@1 n N2 n2 A N!!1 2
(N
1
2
6</p>
      <p>L2
1)2
1</p>
      <p>C 2
n)2 A
=
2
3
0</p>
      <p>X</p>
      <p>1
8p; p6=m lm2p
1</p>
      <p>A
3:29
(a)
(b)
Gravitation
Aver. Grav.</p>
      <p>Objects
Gravitation
Aver. Grav.</p>
      <p>
        Objects
0
5
10
15
20
0
5
10
15
20
Remark 4. The gravitation of an object de ned in (
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
has following properties in one dimension:
An object in the middle has a greater value of
gravitation than one at the edge, which could be clearly seen,
if the objects are distributed uniformly (see Figure 1
(a)).
Remark 5. The gravitation values of cluster objects and
non-cluster objects have great di erences. The non-cluster
objects have usually very small values of gravitation,
meanwhile the cluster objects have larger values. This property is
very important for the clustering process.
      </p>
      <p>If a data set has many objects, which means N is a big
number, then Gai 3:29, from experiments we found out that
using the average gravitation Gai as a threshold to separate
cluster and non-cluster objects returns good results. This is
not a silver bullet, but can be thought as a starting point, the
threshold could be regulated near this value.</p>
      <p>Figure 2 represents an example of SUGRA on two
dimensional data. The value 3.29 have separated the gravitation
on the two dimensional space respectively, where the red
points illustrates the gravitation of rand points.</p>
    </sec>
    <sec id="sec-7">
      <title>2.3 Algorithm of SUGRA</title>
      <p>This algorithm consists of two steps:
1. Data selection (Clustering in one dimensional spaces)
2. Dimension selection (Clustering in high dimensional
spaces)
As a Bottom-Up algorithm, SUGRA handles data rstly in
one dimensional space, because one dimensional data can be
dealt with easily. Finding clusters in high dimensional space
is based on the clusters found in one dimension.
2.3.1</p>
      <sec id="sec-7-1">
        <title>Data selection</title>
        <p>Algorithm 1: Data selection</p>
        <p>IOnuptuptu: tD: S=ai(A; O)
1 foreSaocrht oaaii 2 A do
2
3
4
5
6
7
8
9
10
11 end
12 end
13 end
14 end
initialize t=1
foreach opai 2 oai do
if Gpai &gt; Gai then
if opai 1 2 Stai and jopai</p>
        <p>add opai to Stai
else
t++
add opai to Stai
opai 1j &lt; L then
As discussed above, the clusters are rstly selected on each
dimension through the gravitation. First of all, oai are
sorted in ascending order. For example, if Gpai has a greater
cvaanludeidtahtaen. thIfe itthsrensehigohldboGraio,aiopai is then chosen as a
clusterp 1 is also a cluster-candidate
and their distance is smaller than the average distance, they
will be considered in one cluster, otherwise opai is set into a
new cluster. The process will stop when there is no more
new cluster found in ai. The processes are the same for other
dimensions. Algorithm 1 shows more details about the data
selection.</p>
        <p>After the data selection process, we get subspace
clustering results in every one dimensional space Sai , a subspace
cluster St 2 Sai could have the form like</p>
        <p>
          St = (faig; fo1; o5; o9;
g)
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
2.3.2
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>Dimension selection</title>
        <p>Algorithm 2: Dimension selection
Input: Sai</p>
        <p>Output: SD
1 add all St 2 Sai to SD
2 foreach S 2 SD do
3 while nd S0 2 SD and S 6= S0 do
4 if jS \ S0jO 2 then
5 add S \ S0 to SD
6 end
7 end
8 end
9 return</p>
        <p>
          In data-selection, the one dimensional clusters were found
with the forms like (
          <xref ref-type="bibr" rid="ref12">12</xref>
          ), the nding of subspace cluster in
high dimension is just based on the intersection de ned in
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          ). For subspace clusters S1 and S2, if jS1 \ S2jO 2, then
S1 \ S2 is a new subspace cluster.
        </p>
        <p>Every combination of clusters should be checked through the
intersection, this process will stop when no more new cluster
is found. The nal result will keep only the largest subspace
clusters. The detailed algorithm is shown in Algorithm 2.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>2.4 Further discussions</title>
      <p>The choosing of parameters is usually di cult for a subspace
clustering algorithm, a little deviation may cause a di erent
result. The boundaries between cluster objects and
noncluster objects are especially indistinct and they could be
recognized hardly. SUGRA uses the gravitation function
that marks the cluster and non-cluster objects with great
di erences, which makes the parameter decision easily.
Data selection. The experimental experience shows that
Gai could separate the cluster objects and non-cluster
objects very well by data selection, as de ned in De nition 7,
Gai 3:29 does not depend on jOj and could be used as
threshold for almost all situations. If the results are not
satisfying, the threshold could be set a little smaller or greater.
Dimension selection. The condition jS1 \S2jO 2 is used
in dimension selection to decide, whether S1 \ S2 is a
subspace cluster. The condition indicates that an object-group
with more than two objects will be taken as a new cluster.
This setting has a high precision, because not only big
clusters but also small clusters could be found, but naturally it
takes much time. In contrast, choosing a greater number
may gain time but lose some interesting small clusters.</p>
      <p>In dimension selection, every possible combination of
subspace clusters could be examined, so the maximum run time
of dimension selection is 2m, where m is the number of one
dimensional subspace clusters found in data selection.</p>
    </sec>
    <sec id="sec-9">
      <title>3. CONCLUSIONS</title>
      <p>Subspace clustering is able to discover clusters and extract
their features from the subspace of high dimensional data,
which is commonly gathered in many elds. Most
familiar subspace clustering approaches have the problems with
determining the parameters. We attempt to apply the
gravitation function in subspace clustering in order to nd out
a new method make the determination of the parameters
easier. The method is named SUGRA, which belongs to
Bottom-Up algorithms. Firstly, it nds out clusters by using
gravitation function in one dimensional space, then it
combines the clusters in higher dimensions for searching high
dimensional subspace clusters.</p>
      <p>In SUGRA, the non-cluster objects have always low
gravitation values (&lt;3.29), meanwhile the cluster objects have
very large values, which depend on the clusters' density and
number of objects. The value 3.29 does not depend on the
number of objects, so it enables separating the non-cluster
objects in order to get cluster objects. We don't have to
choose parameters like other algorithms, SUGRA can get
almost a satisfying result for a start by using this threshold.
The future research will be focused on optimizing the
gravitation function and the algorithm in order to improve the
subspace clustering results. The gravitation technique should
be used not only in one dimensional data but also directly in
multiple dimensions. Another work is to let SUGRA adapt
various data types, such as categorical data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Procopiuc</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Fast algorithms for projected clustering</article-title>
          .
          <source>In Proceedings of the 1999 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>61</volume>
          {
          <fpage>72</fpage>
          , Philadelphia, Pennsylvania, United States, May 31-June 03
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Finding generalized projected clusters in high dimensional spaces</article-title>
          .
          <source>In Proceedings of the 2000 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>70</volume>
          {
          <fpage>81</fpage>
          , Dallas, Texas, United States, May
          <volume>15</volume>
          -18
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunopulos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Raghavan</surname>
          </string-name>
          .
          <article-title>Automatic subspace clustering of high dimensional data for data mining applications</article-title>
          .
          <source>In Proceedings of the 1998 ACM SIGMOD international conference on Management of data</source>
          , pages
          <volume>94</volume>
          {
          <fpage>105</fpage>
          , Seattle, Washington, United States, June 01-04
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>C.-H. Cheng</surname>
            ,
            <given-names>A. W.</given-names>
          </string-name>
          <string-name>
            <surname>Fu</surname>
            , and
            <given-names>Y. Zhang.</given-names>
          </string-name>
          <article-title>Entropy-based subspace clustering for mining numerical data</article-title>
          .
          <source>In Proceedings of the fth ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , pages
          <volume>84</volume>
          {
          <fpage>93</fpage>
          , San Diego, California, United States,
          <source>August</source>
          <volume>15</volume>
          -18
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Friedman</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Meulman</surname>
          </string-name>
          .
          <article-title>Clustering objects on subsets of attributes</article-title>
          .
          <source>Journal of the Royal Statistical Society: Series B (Statistical Methodology)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Goil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Nagesh</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Choudhary</surname>
          </string-name>
          .
          <article-title>Ma a: E cient and scalable subspace clustering for very large data sets</article-title>
          .
          <source>Technical Report CPDC-TR-9906-010</source>
          , Northwestern University,
          <year>June 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Nasraoui</surname>
          </string-name>
          .
          <article-title>A new gravitational clustering algorithm</article-title>
          .
          <source>In In Proc. of the SIAM Int. Conf. on Data Mining (SDM</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hinneburg and H.-H. Gabriel</surname>
          </string-name>
          . Denclue 2.
          <article-title>0: Fast clustering based on kernel density estimation</article-title>
          .
          <source>In Proc. of International Symposium on Intelligent Data Analysis 2007 (IDA'07)</source>
          , Ljubljana, Slowenien,
          <year>2007</year>
          . LNAI Springer.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hinneburg</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Keim</surname>
          </string-name>
          .
          <article-title>An e cient approach to clustering in multimedia databases with noise</article-title>
          .
          <source>In Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining</source>
          , New York,
          <year>1998</year>
          . AAAI Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>D.-S. J. Jae-Woo Chang</surname>
          </string-name>
          .
          <article-title>A new cell-based clustering method for large, high-dimensional data in data mining applications</article-title>
          .
          <source>In Proceedings of the 2002 ACM symposium on Applied computing</source>
          , pages
          <volume>11</volume>
          {
          <fpage>14</fpage>
          , Madrid, Spain,
          <year>March 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Clustering through decision tree construction</article-title>
          .
          <source>In Proceedings of the ninth international conference on Information and knowledge management</source>
          , pages
          <volume>20</volume>
          {
          <fpage>29</fpage>
          ,
          <string-name>
            <surname>McLean</surname>
          </string-name>
          , Virginia, United States,
          <source>November</source>
          <volume>06</volume>
          -11
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Parsons</surname>
          </string-name>
          , E. Haque, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Subspace clustering for high dimensional data: A review</article-title>
          .
          <source>Sigkdd Explorations</source>
          ,
          <volume>6</volume>
          :
          <fpage>90</fpage>
          {
          <fpage>105</fpage>
          ,
          <year>June 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. M. Procopiuc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>P. K.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            , and
            <given-names>T. M.</given-names>
          </string-name>
          <string-name>
            <surname>Murali</surname>
          </string-name>
          .
          <article-title>A monte carlo algorithm for fast projective clustering</article-title>
          .
          <source>In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison</source>
          , Wisconsin, June 03-06
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Zhang.</surname>
          </string-name>
          <article-title>A shrinking-based approach for multi-dimensional data analysis</article-title>
          .
          <source>In VLDB '2003: Proceedings of the 29th international conference on Very large data bases</source>
          , pages
          <volume>440</volume>
          {
          <fpage>451</fpage>
          . VLDB Endowment,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>K.-G. Woo</surname>
            and
            <given-names>J.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>FINDIT: a Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting</article-title>
          .
          <source>PhD thesis</source>
          , Korea Advanced Institute of Science and Technology, Taejon, Korea,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>-clusters: Capturing subspace correlation in a large data set</article-title>
          .
          <source>In Proceedings of the 18th International Conference on Data Engineering</source>
          , page
          <volume>517</volume>
          ,
          <string-name>
            <surname>February</surname>
          </string-name>
          26-March 01
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <source>Automatische Parameterbestimmung durch Gravitation in Subspace Clustering. 21. Workshop \Grundlagen von Datenbanken"</source>
          ,
          <source>Rostock</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>