<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Int. Conf. on Data Mining (ICDM). pp.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The Projective Clustering Ensemble problem for Advanced Data Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>EXTENDED ABSTRACT</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlotta Domeniconi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Gullo</string-name>
          <email>gullof@acm.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Tagarelli</string-name>
          <email>andrea.tagarelli@unical.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>George Mason University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>UniCredit, R&amp;D Dept.</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Calabria</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <volume>799</volume>
      <issue>2009</issue>
      <fpage>794</fpage>
      <lpage>799</lpage>
      <abstract>
        <p>After more than five decades, a huge number of models and algorithms have been developed for data clustering. While most attention has been devoted to data types, algorithmic features, and application targets, in the last years there has also been an increasing interest in developing advanced dataclustering tools. In this respect, projective clustering and clustering ensembles represent two of the most important directions: the former is concerned with the discovery of subsets of the input data having different, possibly overlapping subsets of features associated with them, while the latter allows for the induction of a prototype consensus clustering from an available ensemble of clustering solutions. In this paper we discuss the current state-of-the-art research in which the problems of projective clustering and clustering ensembles have been revisited and integrated in a unified framework, called Projective Clustering Ensemble (PCE). We discuss how PCE has originally been formalized as either a two-objective or a single-objective optimization problem, and how the limitations of such early approaches have been overcome by a metacluster-based formulation. We also summarize main empirical results, and provide pointers for future research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Given a set of data objects as points in a multi-dimensional space (or feature space),
the problem of clustering consists in discovering a number of homogeneous subsets
of data, called clusters, which are well-separated from each other [8]. Clustering is a
key step for a myriad of data-management/data-mining tasks that require the
discovery of unknown relationships and patterns in large datasets. Although most clustering
approaches usually provide single clustering solutions and/or use the same (typically
large) feature space, latest advances have focused on two main advanced aspects: (i)
dealing with high dimensionality, and (ii) handling multiple clustering solutions.</p>
      <p>Several problems of practical interest are inherently high-dimensional, i.e., they
involve data objects represented by large sets of features. A common scenario with
highdimensional data is that several clusters may exist in different subspaces that correspond
to different combinations of features. In fact, it is unlikely that all features of the data
objects are equally relevant to form meaningful clusters.</p>
      <p>Another challenge in the clustering process is that in many real-life domains
multiple, and potentially meaningful groupings of the input data are available, hence
providing different views of the data. For instance, in genomics, multiple clustering solutions
D
   </p>
      <p>Projective  Clustering  Ensemble  
C ’ C1’</p>
      <p>C2’
C3’</p>
      <p>C4’
C ’’ C1’’</p>
      <p>C2’’</p>
      <p>C3’’
C ’’’ C1’’’</p>
      <p>C2’’’
C3’’’</p>
      <p>Projective  Consensus  Clustering  
C * C1*</p>
      <p>C2*
C3*
would be needed to capture the multiple functional roles of genes. In text mining,
documents discuss multiple topics, thus their grouping by content should reflect different
informative views which correspond to multiple (possibly alternative) clusterings.</p>
      <p>
        Recent advances in data clustering have led to the definition of the problems of
projective clustering—to deal with high dimensionality—and clustering ensembles—
to handle multiple clustering solutions. The goal of projective clustering [9] is to
discover projective clusters, i.e., subsets of the input data having different, and possibly
overlapping subsets of features (subspaces) associated with them. Projective clustering
aims to solve issues that typically arise in high-dimensional data, such as sparsity and
concentration of distances [
        <xref ref-type="bibr" rid="ref1">7</xref>
        ]. The problem of clustering ensembles [12], also known
as consensus clustering or aggregation clustering, is stated as follows: given a set of
clustering solutions, or ensemble, one must derive a consensus clustering that properly
summarizes the solutions of the ensemble. The input ensemble is typically generated by
varying one or more aspects of the clustering process, such as the clustering algorithm,
the parameter setting, and the number of features, objects or clusters.
      </p>
      <p>Projective clustering and clustering ensembles have recently been treated in a
unified framework [2,3,4,5,6]. The underlying motivation of that study is that many
realworld problems are high-dimensional and lack a-priori knowledge. Examples include
clustering of multi-view data, privacy preserving clustering, news or document retrieval
based on pre-defined categorizations, and distributed clustering of high-dimensional
data. To address both issues simultaneously, the problem of Projective Clustering
Ensembles (PCE) is hence formalized, whose goal is to compute a projective consensus
clustering from an ensemble of projective clustering solutions. Intuitively, each
projective cluster is characterized by a distribution of memberships of the objects as well
as a distribution over the features that belong to the subspace of that cluster. Figure 1
illustrates an example of projective clustering ensemble with three projective
clustering solutions, which are obtained according to different views over the same dataset.
A projective cluster is graphically represented as a rectangle filled with a color
gradient, where higher intensities correspond to larger membership values of objects to
the cluster. Clusters of the same clustering may overlap with their gradient (i.e.,
objects can have multiple assignments with different degrees of membership), and colors
change to denote that different groupings of objects are associated with different feature
subspaces. A projective consensus clustering is derived by suitably “aggregating” the
ensemble members: the first projective consensus cluster is derived by summarizing C10,
C200, and C2000, the second from C20, C300, and C3000, and the third from C40, C100, and C1000.
Note that the resulting color in each projective consensus cluster would merge colors
in the original projective clusters, which means that a projective consensus cluster is
associated with a subset of features shared by the objects in the original clusters.</p>
      <p>In this paper we provide an overview of the PCE problem. We discuss how PCE
has been originally formalized as either a two-objective or a single-objective
optimization problem, and how the limitations of the early approaches have been overcome
by a metacluster-based method that is able to jointly consider the object-based and
feature-based cluster representations in a single-objective optimization problem. We
also summarize main empirical results obtained on benchmark datasets. We finally
provide pointers for future research in such a challenging context.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Projective Clustering Ensembles (PCE)</title>
      <p>Let D be a set of data objects, where each o 2 D is an jF j-dimensional point defined
over a feature space F .4 A projective cluster C defined over D is a pair h C ; C i. C ,
termed the object-based representation of C, is a jDj-dimensional real-valued vector
whose components C;o 2 [0; 1], 8o 2 D, represent the object-to-cluster assignment
of o to C, i.e., the probability Pr(ojC) that the object o belongs to C. C , termed
the feature-based representation of C, is an jF j-dimensional real-valued vector whose
components C;f 2 [0; 1], 8f 2 F , represent the feature-to-cluster assignments of
the f -th feature to C, i.e., the probability Pr(f jC) that the feature f is informative for
cluster C (f belongs to the subspace associated with C).</p>
      <p>The object-based ( C ) and the feature-based ( C ) representations of a projective
cluster C implicitly define the projective cluster representation matrix (for short,
projective matrix) XC of C. XC is a jDj jF j matrix that stores, 8o 2 D, f 2 F , the
probability of the intersection of the events “object o belongs to C” and “feature f
belongs to the subspace associated with C”. Under the assumption of independence
between the two events, such a probability is equal to Pr(Cjo) = C;o joint with
Pr(f jC) = C;f . Hence, given D = fo1; : : : ; ojDjg and F = f1; : : : ; jF jg, the matrix
XC can formally be defined as XC = TC C .</p>
      <p>A projective clustering solution, denoted by C, is defined as a set of projective
clusters that satisfy PC2C C;o = 1, 8o 2 D, and Pf 2F C;f = 1, 8C 2 C. The
semantics of any projective clustering C is that for each projective cluster C 2 C, the
objects belonging to C are close to each other if (and only if) they are projected onto
the subspace associated with C.</p>
      <p>A projective ensemble E is defined as a set of projective clustering solutions. No
information about the ensemble generation strategy (algorithms and/or setups), nor
original feature values of the objects within D are provided along with E . Moreover, each
projective clustering solution in E may contain in general a different number of clusters.</p>
      <p>The goal of PCE is to derive a projective consensus clustering that properly
summarizes the projective clustering solutions within the input projective ensemble.</p>
      <sec id="sec-2-1">
        <title>4 Vectorial notation here denotes row vectors.</title>
        <p>2.1</p>
        <sec id="sec-2-1-1">
          <title>Single-objective PCE</title>
          <p>The earliest PCE formulation proposed in [5] is based on a single-objective function:
C = arg min X X</p>
          <p>C C2C o2D</p>
          <p>C;o XC;o;
where</p>
          <p>XC;o =</p>
          <p>X ( C;f
o;f )2 ;</p>
          <p>o;f =
f2F
and &gt; 1 is a positive integer. To solve the above optimization problem, the
EMbased Projective Clustering Ensembles (EM-PCE) method is proposed in [5]. EM-PCE
iteratively looks for the optimal values of C;o (resp. C;f ) while keeping C;f (resp.</p>
          <p>C;o) fixed, until convergence.</p>
          <p>Weaknesses of single-objective PCE. The objective function in (1) does not allow for
a perfect balance between object- and feature-to-cluster assignments when measuring
the error of a candidate projective consensus clustering solution. This weakness is
formally shown in [3] and avoided by adjusting (1) with a corrective term. The resulting
formulation of the problem based on the corrected objective function is the following:
C = arg min X X</p>
          <p>C C2C o2D
where</p>
          <p>C;o
0</p>
          <p>1
2jE j</p>
          <p>XC;o +</p>
          <p>1
jDj
C;o0 X</p>
          <p>X
1 XC0;o ;</p>
          <p>1</p>
          <p>C^;o C^;o0 A :
o06=o jE j C^2E C^2C^</p>
          <p>The above optimization problem is tackled in [3] by proposing two different
methods. The first one, called E-EM-PCE, follows the same scheme as the EM-PCE
algorithm for the early single-objective PCE formulation. The second method, dubbed
E-2S-PCE, consists of two sequential steps that handle the object-to-cluster and the
feature-to-cluster assignments separately.
(1)
(2)
(3)
(4)
(5)
f (C; C^):
o(C00; C0)) and
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Two-objective PCE</title>
          <p>PCE is formulated in [5] also as a two-objective problem, whose functions account for
the object-based (function o) and the feature-based (function f ) representations:
where</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Functions</title>
        <p>o and
f (C0; C00) = 12 ( f (C0; C00) +</p>
        <p>C = arg min f o(C; E ); f (C; E )g ;</p>
        <p>C
o(C; E ) = X o(C; C^);</p>
        <p>f (C; E ) = X</p>
        <p>C^2E C^2E
f are defined as o(C0; C00) = 21 ( o(C0; C00) +</p>
        <p>f (C00; C0)), respectively, where
o(C0; C00) =
1</p>
        <p>X (1
max J ( C0 ; C00 ));</p>
        <p>C002C00
and J u; v = u vT = kuk22 + kvk22 u vT denotes the Tanimoto coefficient.</p>
        <p>The problem defined in (4) is solved by the MOEA-PCE method, in which a
Paretobased Multi-Objective Evolutionary Algorithm is exploited to avoid combining the two
objective functions into a single one.</p>
        <p>Weaknesses of two-objective PCE. Experimental evidence in [5] has shown that the
two-objective PCE formulation is more accurate than the single-objective one.
Nevertheless, the original two-objective PCE still suffers from an important conceptual
issue: it does not take into consideration the interrelation between the object-based
and the feature-based cluster representations. This can be overcome by employing a
metacluster-based PCE formulation, which we discuss next.
2.3</p>
        <sec id="sec-2-2-1">
          <title>Metacluster-based PCE</title>
          <p>Metacluster-based PCE is defined in terms of the following single-objective function [4]:
P1C^2E of (C; C^), where</p>
          <p>P</p>
          <p>C = arg min of (C; E ); (6)</p>
          <p>C
where of is a function designed to measure the “distance” of a projective clustering
solution C from E in terms of both data clustering and feature-to-cluster assignment:
of (C; E ) = of (C0; C00) = 12 ( of (C0; C00) + of (C00; C0)),
of (C0; C00) = jC0j C02C0 (1 maxC002C00 J^(XC0 ; XC00 )), and J^ is the Tanimoto
coefficient between the linearized versions of the matrices to be compared.</p>
          <p>The PCE formulation reported in (6) is well-suited to measure the quality of a
candidate consensus clustering in terms of both object-to-cluster and feature-to-cluster
assignments as a whole. As shown in [4], this enables us to overcome the conceptual
disadvantages of both early single-objective and two-objective PCE.</p>
          <p>To solve the metacluster-based PCE problem, a two-step is defined in [4]. The
method, called CB-PCE, first discovers a suitable metacluster structure, and then
computes optimal object- and feature-based representations of each metacluster.
2.4</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Enhanced metacluster-based PCE</title>
          <p>Although the early metacluster-based PCE formulation in (6) mitigates the issues of
two-objective PCE, such a formulation has still some limitations. By some
rearrangement and omitting constant terms, the objective function in (6) can be rewritten as:
of (C; E ) = X X mC2inC 1 J^ XC ; XC^
+ X X min 1 J^ XC ; XC^ : (7)</p>
          <p>C^2C^
C^2E C^2C^
|</p>
          <p>{z
o0f (C;E)
}</p>
          <p>C2C C^2E
|
o00f{(zC;E)
}
As formally shown in [6], both o0f and o00f suffer from conceptual drawbacks: the
optimization of o0f favors projective clustering solutions composed of clusters coming all
from the same input ensemble solution, while the optimization of o00f favors the
replication of the same projective cluster within the output consensus projective clustering.</p>
          <p>To solve these issues, an enhanced metacluster-based PCE formulation is proposed
in [6]: Given a projective ensemble E defined over objects D and features F , and an
integer K &gt; 0, find a projective clustering solution C such that jC j = K and
C
s :t :
= arg min X X X x(C^; C) T (C^; C)</p>
          <p>C</p>
          <p>C2C C^2E C^2C^</p>
          <p>C;o = 1; 8o 2 D;
X
C2C
x(C^; C) 2 f0; 1g;
X x(C^; C) 1;
C2C
X x(C^; C)
C^2C^</p>
          <p>X
f 2F</p>
          <p>C;f = 1; 8C 2 C;
8C^ 2 E ; 8C^ 2 C^; 8C 2 C;
8C^ 2 E ; 8C^ 2 C^;
1;
8C^ 2 E ; 8C 2 C:
(8)
(9)
(10)</p>
          <p>The above (NP-hard [6]) problem is solved by E-CB-PCE, i.e., a more effective
and more efficient variant of the method devised for early metacluster-based PCE.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In this section we report the main experimental findings on the various state-of-the-art
PCE methods, i.e., MOEA-PCE [2,5], EM-PCE [2,5], E-EM-PCE [3], E-2S-PCE [3],
CB-PCE [4], and E-CB-PCE [6].</p>
      <p>Datasets. The evaluation was performed on 22 publicly-available benchmark datasets,
whose main characteristics are shown in Table 1.5
Projective-ensemble generation. Projective ensembles were generated by running the
well-established projective-custering LAC algorithm [1] on the selected datasets. The
diversity of the projective clustering solutions was ensured by randomly choosing the
initial centroids and varying the LAC’s parameter h. For each dataset, we generated 10
different projective ensembles; all results were averaged over these 10 ensembles.
5 The first 15 datasets are from the UCI Machine Learning Repository
(http://archive.ics.uci.edu/ml), the next four ones are from the UCR Time Series
Classification/Clustering Page (http://www.cs.ucr.edu/ eamonn/timeseries data), whereas the last three
ones are synthetic datasets from http://dme.rwth-aachen.de/en/OpenSubspace/evaluation.
Assessment criteria. The quality of a projective consensus clustering C was assessed by
using both external and internal evaluations: the former compares C against a reference
classification, whereas the latter is based on the average similarity w.r.t. the solutions
of the input projective ensemble. Three variants of the classic F1-measure were used as
external criteria, aiming at comparing projective clustering solutions in terms of their
object-based representation (F 1o), feature-based representation (F 1f ), or both (F 1of ),
respectively. The same criteria were used for internal evaluation too. The difference in
this case is that the target consensus clustering C is compared against all solutions in the
input ensemble, and an aggregated score is ultimately derived from all individual scores.
Again, three variants were employed, accounting for object-based representation (F 1o),
feature-based representation (Fc1f ), or both (F 1of ), respectively. For a more detailed
definition of all F 1o, F 1f , F 1of , F 1o, F 1f , and F 1of criteria, please refer to [6].
Results. Table 2 shows the accuracy of all methods for each assessment criterion,
averaged on all the selected datasets.6 In particular the table shows the average gain achieved
by the E-CB-PCE method, which was recognized as the most accurate one. E-CB-PCE
was more accurate than both MOEA-PCE and CB-PCE on 16 out of 22 datasets on
average, while achieving average gains up to 0:147 (F 1of assessment criterion) and
0:078 (F 1f assessment criterion) w.r.t. MOEA-PCE and CB-PCE, respectively. Larger
improvements were produced by E-CB-PCE w.r.t. the early single-objective PCE
methods, i.e., EM-PCE, E-EM-PCE, and E-2S-PCE.</p>
      <p>Table 3 shows the runtimes (in milliseconds) of the various algorithms involved in
the comparison. E-CB-PCE was slower than EM-PCE on most datasets, while clearly
outperforming MOEA-PCE. The runtimes of E-CB-PCE were one or two orders of
magnitude smaller than those of MOEA-PCE on average, up to four orders on Isolet.
Only on one dataset, MOEA-PCE was more efficient than E-CB-PCE (Glass), even
though the runtimes of the two methods remained of the same order of magnitude.
Compared to CB-PCE, E-CB-PCE was faster on 11 out of 22 datasets.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>We provided an overview of research work on projective clustering ensembles (PCE).
We discussed the major reasons behind the formulation of PCE as a new topic in the
data clustering field, which required novel computational approaches and algorithmic
solutions. We formally presented a summary of existing formulations of PCE, and
reported major findings learned from experimental evaluation studies.</p>
      <p>Our research work paved the way for the development of data clustering
frameworks that can support a variety of current and emerging applications in several
domains of data management and analysis. For instance, in the complex network systems</p>
      <sec id="sec-4-1">
        <title>6 Dataset-by-dataset scores are available in [6].</title>
        <p>dataset</p>
        <p>Iris
Wine
Glass
Ecoli</p>
        <p>Yeast
Mult.-Feat.</p>
        <p>Segmentation</p>
        <p>Abalone
Waveform</p>
        <p>Letter
Isolet</p>
        <p>Gisette
p53-Mutants</p>
        <p>Amazon
Arcene
Shapes
Tracedata
ControlChart</p>
        <p>Twopat
N30
D75
S2500
2,056
2,558
7,712
14,401
227,878
490,602
233,951
3,411,116
125,247
2,248,695
20,676,754
966,108
58,695
395,988
120,537
211,654
12,777
50,798
31,850
164,969
135,297
290,408
area, one interesting direction is to exploit PCE for addressing problems of
dimensionality reduction and community detection in time-evolving attributed networks. Another
problem that can benefit from our framework is outlier detection in high dimensional
data, which has many of the same challenges as clustering. Some work has been done
on outlier ensembles (e.g., [10]) and on subspace outlier detection (e.g., [11]), but a
unified framework encompassing both does not exist.
cally Adaptive Metrics for Clustering High Dimensional Data. Data Mining and Knowledge</p>
        <p>42:1–42:33 (2016)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hinneburg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keim</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>What Is the Nearest Neighbor in High Dimen8</article-title>
          .
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Algorithms for Clustering Data</article-title>
          . Prentice-Hall (
          <year>1988</year>
          )
          <article-title>9</article-title>
          .
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          , Kro¨ ger,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Zimek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering</article-title>
          .
          <source>TKDD</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ), 1:
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          :
          <fpage>58</fpage>
          (
          <year>2009</year>
          )
          <fpage>10</fpage>
          .
          <string-name>
            <surname>Rayana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akoglu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Less is more: Building selective anomaly ensembles</article-title>
          .
          <source>TKDD</source>
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <fpage>11</fpage>
          .
          <string-name>
            <surname>Sathe</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          :
          <article-title>Subspace outlier detection in linear time with randomized hashing</article-title>
          .
          <source>In: Proc. IEEE Int. Conf. on Data Mining (ICDM)</source>
          . pp.
          <fpage>459</fpage>
          -
          <lpage>468</lpage>
          (
          <year>2016</year>
          )
          <fpage>12</fpage>
          .
          <string-name>
            <surname>Strehl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghosh</surname>
            ,
            <given-names>J.: Cluster</given-names>
          </string-name>
          <string-name>
            <surname>Ensembles - A Knowledge Reuse</surname>
          </string-name>
          <article-title>Framework for Combining</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>