A Lattice-Based Consensus Clustering Algorithm

Artem Bocharov, Dmitry Gnatyshak, Dmitry I. Ignatov, Boris G. Mirkin, and
                          Andrey Shestakov

             National Research University Higher School of Economics
                                dignatov@hse.ru
                               http://www.hse.ru


      Abstract. We propose a new algorithm for consensus clustering, FCA-
      Consensus, based on Formal Concept Analysis. As the input, the algo-
      rithm takes T partitions of a certain set of objects obtained by k-means
      algorithm after T runs from different initialisations. The resulting con-
      sensus partition is extracted from an antichain of the concept lattice built
      on a formal context objects × classes, where the classes are the set of all
      cluster labels from each initial k-means partition. We compare the results
      of the proposed algorithm in terms of ARI measure with the state-of-the-
      art algorithms on synthetic datasets. Under certain conditions, the best
      ARI values are demonstrated by FCA-Consensus.

      Keywords: consensus clustering, k-means, Formal Concept Analysis,
      ensemble clustering, lattice-based clustering


1   Introduction and related work

Although the subject of consensus partition has been considered in the literature
as early as in 1960s ([1], [2]), its popularity is based on concerns of the 21st
century when clustering has become an ubiquitous activity. An innocent user
wants to segment their data into homogeneous segments, a.k.a. clusters; they
apply clustering tools and see many different solutions whose comparative merits
are not clear. Therefore, they need a tool to reconcile all the clusterings produced
by different tools or even by the same tool at different parameter values.
    As the input the consensus clustering approach usually takes T partitions of
a certain set of objects obtained, for example, by k-means algorithm after its T
different executions with possibly different k. The resulting consensus partition
is built from the matrix objects × classes, where the classes are the set of all
cluster labels from each initial k-means partition. Thus, the main goal of consen-
sus clustering is to find (recover) an optimal partition, i.e. to guess the proper
number of resulting clusters and put the objects into each part correctly (see, for
example, [3], [4]). To evaluate a consensus clustering method, researchers usually
hypothesise that if a particular consensus clustering approach is able to find a
proper k and attain high accuracy on pre-labeled datasets, then it can be used
in the unsupervised setting.
    In [5], consensus clustering algorithms are classified in three main groups:
probabilistic approaches [6,7]; direct approaches [3,8,9,10], and pairwise similarity-
based approaches [11,12]. In the last category of methods, the (i, j)-th entry aij
of the consensus matrix A = (aij ) shows the number of partitions in which
objects gi and gj belong to the same cluster. In the previous papers [13,14], a
least-squares consensus clustering approach was invoked from the paper [15], to
equip it with a more recent clustering procedure for consensus clustering and
compare the results on synthetic data of Gaussian clusters with those by the
more recent methods.
    Here, our main goal is to propose a novel lattice-based consensus clustering
algorithm by means of FCA and show its competitive applicability. To the best of
our knowledge, a variant of FCA-based consensus approach was firstly proposed
for clustering genes [16]. Those who are interested in theoretical properties of
consensus procedures and relations to FCA are referred to [17].
    The paper is organised in five sections. In Section 2, we refresh some defini-
tions from FCA, introduce partitions and the lattice of partitions, and show how
any partition lattice can be mapped to a concept lattice. In Section 3, we in-
troduce our modification of Close-by-One algorithm for consensus clustering. In
Section 4, we describe our experimental results over synthetic data both for indi-
vidual properties of FCA-Consensus and for comparison with the state-of-the-art
clustering methods. Section 5 concludes the paper and outlines prospective ways
of research and developments.


2   Basic definitions

First, we recall several notions related to lattices and partitions.

Definition 1. A partition
                 S        of a nonempty set A is a set of its subsets σ = {B |
B ⊆ A} such that     B = A and B ∩ C = ∅ for all B, C ∈ σ. Every element of
                     B∈σ
σ is called block.

Definition 2. A partition lattice of set A is an ordered set (P art(A), ∨, ∧) where
P art(A) is a set of all possible partitions of A and for all partitions σ and ρ
supremum and infimum are defined as follows:
                                           [
                     σ ∨ ρ = {Nρ (B) ∪              Nσ (C)|B ∈ σ},
                                         C∈Nρ (B)


               σ ∧ ρ = {B ∩ C | for all B ∈ σ and C ∈ ρ}, where
Nρ (B) = {C | B ∈ σ, C ∈ ρ and B ∩ C 6= ∅} and Nσ (C) = {B | B ∈ σ, C ∈
ρ and B ∩ C 6= ∅}.

Definition 3. Let A be a set and let ρ, σ ∈ P art(A). The partition ρ is finer
than the partition σ if every block B of σ is a union of blocks of ρ, that is ρ ≤ σ.
    Equivalently one can use traditional connection between supremum, infimum
and partial order in the lattice: ρ ≤ σ iff ρ ∨ σ = σ (ρ ∧ σ = ρ).
    Now, we recall some basic notions of Formal Concept Analysis (FCA) [18].
Let G and M be sets, called the set of objects and attributes, respectively, and
let I be a relation I ⊆ G × M : for g ∈ G, m ∈ M , gIm holds iff the object
g has the attribute m. The triple K = (G, M, I) is called a (formal) context. If
A ⊆ G, B ⊆ M are arbitrary subsets, then the Galois connection is given by the
following derivation operators:


                      A0 = {m ∈ M | gIm for all g ∈ A},
                                                                                (1)
                      B 0 = {g ∈ G | gIm for all m ∈ B}.

    The pair (A, B), where A ⊆ G, B ⊆ M , A0 = B, and B 0 = A is called a
(formal) concept (of the context K) with extent A and intent B (in this case we
have also A00 = A and B 00 = B).
    The concepts, ordered by (A1 , B1 ) ≥ (A2 , B2 ) ⇐⇒ A1 ⊇ A2 form a com-
plete lattice, called the concept lattice B(G, M, I).

Theorem 1. (Ganter&Wille [18]) For a given partially ordered set P = (P, ≤)
the concept lattice of the formal context K = (J(P ), M (P ), ≤) is isomorphic
to the Dedekind–MacNeille completion of P, where J(P) and M(P) are set of
join-irreducible and meet-irreducible elements of P.

Theorem 2. (this paper) For a given partition lattice L = (P art(A), ∨, ∧) there
exist a formal context K = (P2 , A2 , I), where P2 = {{a, b} | a, b ∈ A and a 6= b},
A2 = {σ | σ ∈ P art(A) and |σ| = 2} and {a, b}Iσ when a and b belong to the
same block of σ. The concept lattice B(P2 , A2 , I) is isomorphic to the initial
lattice (P art(A), ∨, ∧).

Proof. According to Theorem 1 the concept lattice of the context KDM =
(J(L), M (L), ≤) is isomorphic to the Dedekind–McNeille completion of L. The
Dedekind–McNeille completion of a lattice is its isomorphic lattice by the defini-
tion (as a minimal completion which forms a lattice). So, we have to show that
contexts K and KDM (or their concept lattices) are isomorphic.
    E.g., from [19] (Lemma 1, Chapter 4, Partition Lattices), we have that the
atoms of a partition lattice are those its partitions which have only one block of
two elements, the remaining blocks are singletons, and its coatoms are partitions
into two blocks.
    It is evident that all the atoms are join-irreducible and all the coatoms are
meet-irreducible and that there are no other irreducible elements of the partition
lattice L.
    Let σ and ρ be two partitions from L, σ ∈ J(L) and ρ ∈ M (L), and σ ≤ ρ.
It means that all blocks of σ are subsets of blocks of ρ and the non-trivial block
{i, j} ∈ σ is a subset of one of the blocks of ρ. Note that A2 coincides with the
coatom set. It directly implies that {i, j}Iρ iff an atom σ with block {i, j} is
finer than a coatom ρ.
   In addition we can show the correspondence between elements of L = (P art(A), ∨, ∧)
and formal
    V      concepts of B(P2 , A2 , I). Every (A, B) ∈ B(P2 , A2 , I) corresponds to
σ = B and every pair {i, j} from A is in one of σ blocks,VwhereWσ ∈ P art(A).
Every (A, B) ∈ BDM (J(L), M (L), ≤) corresponds to σ = B = A.
Example 1. In Fig. 1, one can see the diagram of a concept lattice isomorphic
to the partition lattice of 4-element set.


Fig. 1. The line diagram of a concept lattice isomorphic to the partition lattice of
4-element set (reduced labeling).


3   FCA-Consensus: adding objects one-by-one
To work in FCA terms we need to introduce a (formal) partition context that
corresponds to the matrix X from the previous subsection. Let us consider such
a context KR = (G, tMt , I ⊆ G × tMt ), where G is a set of objects, t = 1, . . . , T
, and each Mt consists of labels of all clusters in the t-th k-means partition from
the ensemble. For example, gImt1 means that object g has been clustered to the
first cluster by t-th clustering algorithm in the ensemble.
    Our FCA-Consensus algorithm looks for S, an antichain of concepts of KR ,
such that for every (A, B) and (C, D) the condition A ∩ C = ∅ is fulfilled. Here,
the concept extent A corresponds to one of the resulting clusters, and its intent
contains all labels of the ensemble members that voted for the objects from A
being in one cluster. The input cluster sizes may vary, but it is a reasonable
consensus hypothesis that at least dT /2e should vote for a set of objects to be
in cluster.
   One can prove a theorem below, where by true partition we mean the original
partition into clusters to be recovered.
Theorem 3. In the concept lattice of a partition context KR = (G, tMt , I ⊆
G×tMt ), there is the antichain of concepts S such that all extents of its concepts
Ai coincide with Si from σ, the true partition, if and only if Si00 = Si where
i = 1, . . . , |σ|.
Proof. The proof is obvious because of the fact that parts of partitions are non-
intersecting and each part should be closed to form a concept extent.
    In fact, it happens if all ensemble algorithms have voted for all objects from
Si to belong in a same concept (cluster). However, this is a rather strong require-
ment and we should experimentally study good candidates for such an antichain.
    The algorithm below works incrementally by adding objects one by one and
checking a new “canonicity” conditions, like it is in algorithms ADDI [11] and
Close by One (CbO) [20]. Here the stopping condition is of course different: it
is |Y | ≥ dT /2e, where Y is the intent of the current concept. Moreover, the
covered objects at a particular step should not be added with any concept to
the antichain S further.


    Algorithm 1: Main((G, M, I), T )
     Input: a partition context (G, M, I) and the number of ensemble clusterers T
     Output: S
      1: C = ∅
      2: for all g ∈ G do
      3:   if g 6∈ C then
      4:      gpp = g 00
      5:      gp = g 0
      6:      S.enqueue(gpp, gp)
      7:      C = C ∪ gpp
      8:   end if
      9: end for
     10: return Process((G, M, I), k, S)


    Thus, the resulting antichain S may not cover all objects but we can add
each non-covered object g to a concept (A, B) ∈ S with maximal size of the
intersection, |B∩g 0 |. Traditionally, the algorithm consists of two parts, a wrapper
procedure, Main, and a recursive procedure, Process.


4     Experimental results
All evaluations are done on synthetic datasets that have been generated using
Matlab. Each of the datasets consists of 300 five-dimensional objects compris-
ing three randomly generated spherical Gaussian clusters. The variance of each
Algorithm 2: Process((G, M, I), T, S)
  1: T = S
  2: Cover = ∅ While T 6= ∅
  3: T.dequeue(A, B)
  4: if A ∩ Cover = ∅ then
  5:    Cover = Cover ∪ A
  6:    P.enqueue(A, B)
  7:    for all g ∈ min(G \ Cover) do
  8:      X = A ∪ {g}
  9:      Y = X0
 10:      if |Y | ≥ dT /2e then
 11:         Z =Y0
 12:         if {h|h ∈ Z \ X, h < g} = ∅ then
 13:            P.dequeue(A, B)
 14:            P.enqueue(Z, Y )
 15:            Cover = Cover ∪ Z
 16:         end if
 17:      end if
 18:    end for
 19: end if
 20: if S = P then
 21:    return P
 22: end if
 23: S = P
 24: return Process((G, M, I), T, P)
cluster lies in 0.1 − 0.3 and its center components are independently generated
from the Gaussian distribution N (0, 0.7).
Let us denote thus generated partition as λ with kλ clusters. The profile of par-
titions R = {ρ1 , ρ2 , . . . , ρT } for consensus algorithms is constructed as a result
of T runs of k-means clustering algorithm starting from random k centers.
    We carry out the experiments in four settings:

 1. Investigation of the influence of the number of clusters kλ ∈ {2, 3, 5, 9} under
    various numbers of minimal votes (Fig. 2),
     a) two clusters case kλ = 2, k ∈ {2, 3, 4, 5},
     b) three clusters case kλ = 3, k ∈ {2, 3},
     c) five clusters case kλ = 5, k ∈ {2, 5},
     d) nine clusters case kλ = 9, k ∈ {2, 3, 4, 5, 6, 7, 8, 9};
 2. Investigation of the numbers of clusters of ensemble clusterers with a fixed
    number of true clusters kλ = 5 (Fig. 3),
     a) k = 2,
     b) k ∈ {2, 3, 4, 5},
     c) k ∈ {5},
     d) k ∈ {5, 6, 7, 8, 9}
     e) k = 9;
 3. Investigation of the number of objects N ∈ {100, 300, 500, 1000} (Fig. 4);
 4. Comparison with other state-of-the-art algorithms (Fig. 5–8),
     a) two clusters case kλ = 2, k ∈ {2, 3, 4, 5},
     b) three clusters case kλ = 3, k ∈ {2, 3},
     c) five clusters case kλ = 5, k ∈ {2, 5},
     d) nine clusters case kλ = 9, k ∈ {2, 3, 4, 5, 6, 7, 8, 9}.

    Each experiment encompasses 10 runs for each of the ten generated datasets.
Such meta-parameters as the dimension number p = 3, the number of partitions
(clusterers) in the ensemble T = 100, and the parameters of Gaussian distribu-
tion have been fixed for each experiment. After applying consensus algorithms,
Adjusted Rand Index (ARI) [5] for the obtained consensus partition σ and the
generated partition λ is computed as ARI(σ, λ).
    Given two partitions ρa = {R1a , . . . , Rkaa } and ρb = {R1b , . . . , Rkb b }, where
Nh = |Rha | is the cardinality
  a
                                  of Rha , Nhm = |Rha Rm
                                                          T b
                                                               |, N is the number of
               P Nha
                       
                            P Nha (Nha −1)
objects, Ca =            =           2      .
                h    2       h
                                                            
                                     P Nhm                     N
                                                     − Ca Cb
                                     hm       2                 2
                    ARI(ρa , ρb ) =                                                 (2)
                                     1                         N
                                     2 (C a +   Cb ) − Ca Cb
                                                                2
    This criterion expresses similarity of two partitions; its values vary from 0 to
1, where 1 means identical partitions, and 0 means totally different ones.
4.1    Comparing consensus algorithms
The lattice-based consensus results have been compared with the results of the
following algorithms (Fig. 5–8):
 – AddRemAdd ([21,13])
 – Voting Scheme (Dimitriadou, Weingessel and Hornik, 2002) [8]
 – cVote (Ayad, 2010) [9]
 – Condorcet and Borda Consensus (Dominguez, Carrie and Pujol, 2008) [10]
 – Meta-CLustering Algorithm (Strehl and Ghosh, 2002) [3]
 – Hyper Graph Partitioning Algorithm [3]
 – Cluster-based Similarity Partitioning Algorithm [3]


            0.9                                                Two cluters
            0.8                                               Three clusters
                                                              Five clusters
            0.7                                               Nine clusters
      ARI


            0.6
            0.5
            0.4
            0.3

                  0   10%   20%     30%     40%    50%       60%    70%
                                  Minimal voting threshold

Fig. 2. Influence of minimal voting threshold to ARI for different number of true
clusters


    To provide the reader with more details we show the values of ARI graphically
for each dataset out of ten used. The summarised conclusions are given in the
next section.


5     Conclusion
Our experiments lead us to the following conclusions:

 – The “Optimal voting threshold” as related to the minimum intent size for the
   resulting antichain of concepts is not constant; moreover, it is not usually
   the majority of ensemble members (see Fig. 2).
 – Our FCA-based consensus clustering method works better when the number
   of clusters at the ensemble clusterers is equal to the number of true clusters
   (see Fig. 3).
             1                                                                           2
           0.9                                                                          2–5
           0.8                                                                           5
           0.7                                                                          5–9
           0.6                                                                           9
     ARI


           0.5
           0.4
           0.3
           0.2
           0.1
             0
                 1    2     3     4     5        6       7       8       9        10
                                            Dataset no.

Fig. 3. Influence of minimal voting threshold to ARI for different numbers of clusters
of the ensemble clusterers (each point is averaged over 10 datasets)


             1
           0.9                                                                         100
                                                                                       300
           0.8
                                                                                       500
           0.7                                                                         1000
           0.6
     ARI


           0.5
           0.4
           0.3
           0.2
           0.1
                 1    2    3     4     5     6       7       8       9       10
                                            Dataset no.

                 Fig. 4. Influence of different numbers of objects to ARI
                       HGPA         Condorse         CVote           Vote
             Lattice          ARA        Borda              MCLA            CSPA
        1                                              1
      0.9                                            0.9
      0.8                                            0.8
      0.7                                            0.7
      0.6                                            0.6
ARI


                                               ARI
                                                     0.5
      0.5
                                                     0.4
      0.4
                                                     0.3
      0.3                                            0.2
      0.2                                            0.1
      0.1                                              0
            1 2 3 4 5 6 7 8 9 10                           1 2 3 4 5 6 7 8 9 10
                    Dataset no.                                    Dataset no.

            Fig. 5. Two clusters                       Fig. 6. Three clusters
        1
                                                     0.9
      0.9
                                                     0.8
      0.8
                                                     0.7
      0.7
      0.6                                            0.6
ARI


                                               ARI


      0.5                                            0.5
      0.4                                            0.4
      0.3                                            0.3
      0.2                                            0.2
      0.1
                                                     0.1
            1 2 3 4 5 6 7 8 9 10                           1 2 3 4 5 6 7 8 9 10
                    Dataset no.                                    Dataset no.

            Fig. 7. Five clusters                          Fig. 8. Nine clusters
 – The resulting ARI value depends on the number of objects: The greater the
   number, the smaller the ARI (see Fig. 4).
 – When the number of true clusters is two (and almost always when it is three)
   our method beats the other algorithms under comparison; in some of these
   cases the consensus has 100% accuracy (see Fig. 5–6).
 – For larger numbers of clusters, our method is positioned as a median among
   the methods under comparison (see Fig. 7–8).

    One straightforward step to be taken is testing our algorithm over real
datasets. The algorithm can be modified for application on the space of all
partition labels when the number of objects is greater than that of the labels.
The algorithm complexity and time-efficiency should be carefully studied and
compared with those of the existing algorithms. An interesting venue is to con-
sider the partition lattices as a search space for finding an optimal partition.
For example, one can build a pattern structure [22] over partitions similar to
one in [23] and analyse the correlation of stability indicies [24] of the partitions
as pattern concepts with the ARI measure. One may hope that by so doing it
could be possible to find or describe “good” regions in the lattice by using the
partition union and partition intersection operations.


Acknowledgments We would like to thank Jaume Baixeries, Amedeo Napoli,
Alexei Buzmakov, Mehdi Kaytoue, and Oleg Anshakov for their comments, re-
marks and help while preparing this paper. The paper was prepared within the
framework of the Basic Research Program at the National Research University
Higher School of Economics (HSE) and supported within the framework of a sub-
sidy by the Russian Academic Excellence Project “5-100”. The third co-author
was partially supported by Russian Foundation for Basic Research, grants no.
16-29-12982 and 16-01-00583.


References
 1. Régnier, S.: Sur quelques aspects mathématiques des problèmes de classification
    automatique. Mathématiques et Sciences humaines 82 (1983) 13–29 (reprint of
    I.C.C. bulletin 4: 1965, 175–191).
 2. Mirkin, B.: An approach to the analysis of non-numerical data. In Bagrinovsky,
    K., ed.: Mathematical Methods of Modeling and Solutions for Economic Problems,
    Institute of Economics and Organization of Industrial Production, Siberian Branch
    of the USSR’s Academy of Science (1969) 141–150 (in Russian).
 3. Strehl, A., Ghosh, J.: Cluster ensembles — A knowledge reuse framework for
    combining multiple partitions. Journal of Machine Learning Research 3 (2002)
    583–617
 4. Kuncheva, L.I., Vetrov, D.P.: Evaluation of stability of k-means cluster ensembles
    with respect to random initialization. IEEE Trans. Pattern Anal. Mach. Intell.
    28(11) (2006) 1798–1808
 5. Ghosh, J., Acharya, A.: Cluster ensembles. Wiley Interdisc. Rew.: Data Mining
    and Knowledge Discovery 1(4) (2011) 305–315
 6. Topchy, A.P., Jain, A.K., Punch, W.F.: A mixture model for clustering ensembles.
    In: Proceedings of the Fourth SIAM International Conference on Data Mining,
    Lake Buena Vista, Florida, USA, April 22-24, 2004. (2004) 379–390
 7. Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. Statistical Analysis
    and Data Mining 4(1) (2011) 54–70
 8. Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy
    clustering. IJPRAI 16(7) (2002) 901–912
 9. Ayad, H., Kamel, M.S.: On voting-based consensus of cluster ensembles. Pattern
    Recognition 43(5) (2010) 1943–1953
10. Dominguez, X.S., Carrie, J.S., Pujol, F.A.: Fuzzy clusterers combination by posi-
    tional voting for robust document clustering. Procesamiento del lenguaje natural
    43 (2009) 245–253
11. Mirkin, B.: Clustering: A Data Recovery Approach. CRC Press (2012)
12. Guénoche, A.: Consensus of partitions : a constructive approach. Advances in
    Data Analysis and Classification 5(3) (2011) 215–229
13. Mirkin, B.G., Shestakov, A.: Least square consensus clustering: Criteria, methods,
    experiments. In: Advances in Information Retrieval - 35th European Conference on
    IR Research, ECIR 2013, Moscow, Russia, March 24-27, 2013. Proceedings. (2013)
    764–767
14. Mirkin, B., Shestakov, A. In: A Note on the Effectiveness of the Least Squares
    Consensus Clustering. Springer New York, New York, NY (2014) 181–185
15. Mirkin, B., Muchnik, I.: Geometrical interpretation of clustering scoring func-
    tions. In: Methods for the Analysis of Multivariate Data in Economics, Novosibirsk,
    Nauka Publisher (1981) 3 – 11 (In Russian).
16. Hristoskova, A., Boeva, V., Tsiporkova, E.: A Formal Concept Analysis Approach
    to Consensus Clustering of Multi-Experiment Expression Data. BMC Bioinfor-
    matics 15 (2014) 151
17. Domenach, F., Tayari, A.: Implications of Axiomatic Consensus Properties. In: Al-
    gorithms from and for Nature and Life: Classification and Data Analysis. Springer
    International Publishing, Cham (2013) 59–67
18. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. 1st
    edn. Springer-Verlag New York, Inc., Secaucus, NJ, USA (1999)
19. Graetzer, G.: General Lattice Theory. Akademie-Verlag (1978)
20. Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects from
    an arbitrary semilattice. Nauchno-Tekhnicheskaya Informatsiya Seriya 2 - Infor-
    matsionnye protsessy i sistemy (1) (1993) 17–20
21. Mirkin, B.: Core concepts in Data Analysis: summarization, correlation, visualiza-
    tion. Springer (2011)
22. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Concep-
    tual Structures: Broadening the Base, 9th International Conference on Conceptual
    Structures, ICCS 2001, Stanford, CA, USA, July 30-August 3, 2001, Proceedings.
    (2001) 129–142
23. Codocedo, V., Napoli, A.: Lattice-based biclustering using partition pattern struc-
    tures. In: ECAI 2014 - 21st European Conference on Artificial Intelligence, 18-22
    August 2014, Prague, Czech Republic - Including Prestigious Applications of In-
    telligent Systems (PAIS 2014). (2014) 213–218
24. Kuznetsov, S.O.: Stability as an estimate of degree of substantiation of hypotheses
    derived on the basis of operational similarity. Nauchno-Tekhnicheskaya Informat-
    siya Seriya 2 - Informatsionnye protsessy i sistemy (12) (1990) 21–29