Clustering and Ranking of Image Search Results

                                                      c Elena Sivogolovko ∗
                                                    University of Saint-Petersburg
                                                         efecca@gmail.com


                           Abstract                                    other CBIR systems, for example F-I-T[6] and ImageR-
                                                                       ank[7], but these systems do not consider information
     In a typical content-based image retrieval                        about image features meaning, which can be obtained
     (CBIR) system, query result is a set of images                    during clustering process. However, we suppose that
     sorted by feature similarities with respect to the                such kind of information can be usefull for search result
     query. We introduce a new approach to CBIR                        representation.
     result representation. We propose that CBIR
     system should retrieve image clusters, which el-                  2.1   Bipartite graph model
     ements should be sorted by the most meaningful                    Graph model is widely used in different clustering tasks,
     feature similarities. Actually, this paper does                   such as sparse matrix partitioning, circuit partitioning,
     not present a full approach but rather work in                    image segmentation and document clustering. It can also
     progress.                                                         be used in image clustering. Qiu[4] used the undirected
                                                                       bipartite graph(Figure 1) to represent the relationship be-
1    Introduction                                                      tween images and their low-level features. This graph is
                                                                       constructed as following. Assuming each image in the
In image search, good organization of the search results               database is represented by an m-bin colour histogram and
is as important as the search accuracy. Many existing                  Hk = (h(c1 ), h(c2 ) . . . h(cm )) denotes the histogram of
CBIR search engines return large quantity of search re-                the k-th image, where k = 1, 2, . . . n and the value of
sults, ranked by their relevance to the given query. Users             hk (ci ) indicates an association of the colour ci with the
have to go through the list and look for the desired                   k-th image. Then the bipartite graph can be represented
ones. This is a time consuming task since the returned                 by a triplet G = (I, C, W ), where I = {i1 , i2 , · · · , in }
results usually contain multiple topics and these topics               is a set of images, C = {c1 , c2 , · · · , cm } is a set of colors
are mixed together. Things become even worse when                      and W is a set of edges connecting vertices from differ-
one topic is dominating but it is not what the user de-                ent vertex sets, i.e., W = {< i, j > |i ∈ I, j ∈ C}. The
sires. A possible solution to this problem is to cluster               weight of edge Wij = hi (cj ) is the j-th colour bin count
search results into different semantic groups. In tradi-               of the i-th image.
tional CBIR area, image clustering techniques are often
used to design a convenient user interface, which helps to
make more meaningful representations of search results.
However, as the images were usually represented by low
level visual features, it is hard to get a good clustering
result from semantic perspective, because images with
high feature similarities to each other may be very differ-
ent in terms of semantics. This is known as the semantic
gap problem.
   In this paper, we consider the problem of clustering
and ranking image search results. We propose a frame-
work to represent an image query result as a clustering
                                                                       Figure 1: Images and features bipartite graph. Squares
structure where elements in each cluster are sorted by
                                                                       and circles represent low-level features(colors in our
the most meaningful features.
                                                                       case) and images respectively.

2    Related work                                                         Note that we can consider this graph as a similarity
                                                                       graph with similarity function S = Wij . The adjacency
In this section we will review graph partitioning and                  matrix of G will be written as:
spectral clustering method, which are the foundation of
our framework. Such approaches are successfuly used in                                                   I      C
                                                                                            M= I         0      W                    (1)
    ∗ This work is partially supported by Russian Foundation for Ba-                           C        WT      0
sic Research under grant 07-07-00268.
Proceedings of the Spring Young Researcher’s Colloquium                   In this context co-clustering of images and features
on Database and Information Systems, Moscow, Russia, 2009              can be translated to a graph partitioning problem.
2.2    Graph partitioning and spectral co-clustering                 3    Co-clustering and ranking framework
The idea of clustering is to separate a set of points into           Suppose the dashed line in Figure 1 shows the very
different groups according to their similarities. For data           partition that minimizes two part cut. After apply-
given in a form of a similarity graph this problem can               ing clustering algorithm, we will obtain two subsets
be restated as follows: we want to find a partition of               {i1 , i2 , i3 , i4 , c1 , c2 } and {i5 , i6 , c3 , c4 }. We define such
the graph such that the edges between different groups               clusters as mixed. Consequently, the low-level fea-
have very low weight (which means that points in dif-                tures are divided into two feature clusters {c1 , c2 } and
ferent clusters are dissimilar from each other) and the              {c3 , c4 }, while the images are divided into two image
edges within a group have high weight (which means that              clusters {i1 , i2 , i3 , i4 } and {i5 , i6 } respectively. The
points within the same cluster are similar to each other).           most of CBIR systems return obtained image subsets and
Given a similarity graph G =< V, E > (in our case                    do not use feature subsets at all. We propose to sort im-
V = I ∪ C and E = W ) with adjacency matrix M , the                  ages in each image cluster using low-level features from
simplest and most direct way to construct a partition is to          corresponding feature cluster. We suggest that such kind
solve the mincut problem. This consists of choosing the              of ranking helps us to improve image retrieval results.
partition {A1 , . . . , Ak } which minimizes                            To sort images by more than one feature we need a
                                                                     data fusion method. In general data fusion is use of
                                    k
                                    X                                techniques that combine data from multiple sources and
            cut(A1 , . . . Ak ) =         cut(Ai , Āi )       (2)   gather that information in order to achieve inferences,
                                    i=1                              which will be more efficient and potentially more ac-
                                                                     curate than if they were achieved by means of a single
where                                                                source. In our case source is a list of images ranked
                                     X                               by one of the selected low-level features. A lot of
               cut(Ai , Āi ) =                mij ,           (3)
                                                                     commonly used data fusion algorithms such as Comb-
                                   i∈A,j∈Āi
                                                                     SUM, CombMNZ are successfuly used in text retrieval.
                                                                     CombMNZ is considered to outperform other data fusion
mij is element of adjacency matrix M , which corre-
                                                                     algorithms. It works as follows. Element in the result
sponds to edge form i vertex to j vertex, and Āi is the
                                                                     ranked-list gets rank equaled to the sum of all its ranks in
complement of a subset A ∈ V . The problem is that
                                                                     fused lists multipied by the number of lists in which this
in many cases, the solution of mincut simply consists
                                                                     element exists with non-zero rank:
in separating one individual vertex from the rest of the
graph. Of course this is not what we want to achieve in                                                  X
                                                                               rankresult (i) =                    ranklist (i) ∗ nz      (7)
clustering, as clusters should be reasonably large groups
                                                                                                   fusedlists
of points. One way to circumvent this problem is to ex-
plicitly request that the sets A1, . . . , Ak are ”reasonably        where nz is number of non-zero rank lists. To summa-
large”. The two most common objective functions which                rize, our algorithm of images and features co-clustering
encode this are RatioCut and the normalized cut – Ncut.              and ranking can be listed as below.
In RatioCut, the size of a subset A of a graph is mea-
sured by its number of vertices |A|, while in Ncut the
size is measured by the weights of its edges vol(A).                 Algorithm 1 Co-clustering and ranking algorithm
                                                                     Require: Image set I and the vectors H representing
                                    Pk
                                                                         histograms of these images
                                          i=1 cut(Ai , Āi )
      RatioCut(A1 , . . . Ak ) =                               (4)    1: Form a bipartite graph G =< I,P     C, W >
                                               |Ai |                                                            n
                                                                      2: Compute matrix M , D : Dii =           k=1 mik and L =
                                    Pk                                   D − M.
                                      i=1 cut(Ai , Āi )
                                                                      3: Compute the first k eigenvectors v1, . . . , vk of the
         N cut(A1 , . . . Ak ) =                               (5)       generalized eigenproblem Lv = λDv.
                                           vol(Ai )
                                                                      4: Let V ∈ Rn×k be the matrix containing the vectors
In this work we use Ncut as objective function, because it               v1 , . . . , vk as columns.
implements both clustering tasks mentioned below: Ncut                5: For i = 1 . . . n let yi ∈ Rk be the vector correspond-
minimizes the between-cluster similarity and maximizes                   ing to the i-th row of V .
the within-cluster similarity unlike RatioCut, that imple-            6: Cluster the points (yi )i=1,...,n in Rk with the k-
ments only first of them. Ncut could be rewritten as a                   means algorithm into mixed clusters C1 , . . . , Ck .
trace minimization problem and leads to retrieving first              7: Extract image clusters from mixed ones. Rank
k generalized eigenvectors of                                            images in every cluster use features from corre-
                                                                         sponding feature clusters and data fusion method
                         Lv = λDv                              (6)       CombMNZ.
                                                                      8: Return set of image clusters I1 , . . . , Ik .
                                            Pn
here D is a diagonal matrix with Dii =        k=1 mik ,
which is also called degree matrix, and L = D − M                       It is important to note that we should normalize fea-
is unnormalized graph Laplacian matrix. After that, the              ture values before applying the algorithm to the graph.
desired image clusters can be obtained by running some               In fact, most widely used image features such as color
routine clustering algorithms such as k-means on these               histogram, CPAM histogram and Spatial CPAM, have al-
eigenvectors.                                                        ready introduced the mechanism of normalization.
4     Future Work                                             [3]   E.A. Fox, J.A. Shaw. Combination of multiple
                                                                    searches. In TREC 2 (1994), 243249.
First, we will test our framework on real image queries.
We plan to use two different databases from CorelPhoto-       [4]   Guoping Qiu: CBipartite Graph Partitioning and
Set collection: ImageDBCorel database(650 images in 9               Content-based Image Clustering.
semantic classes) and CorelSmall-100 database(100 im-
ages in 16 semantic classes). Second, if our co-clustering    [5]   Manjeet Rege, Ming Dong, Jing Hua: Clus-
and ranking scheme gives good results, we use the fol-              tering Web Images with Multi-modal Features.
lowing research approach.                                           SIGMM07, September 2328, 2007.
    As in many other clustering systems, the most diffi-
cult question in our co-clustering scheme is ”How will        [6]   Bin Gao, Tie-Yan Liu, Tao Qin, Xin Zheng1, Qian-
we define a number of clusters?”. In our case such clus-            Sheng Cheng, Wei-Ying Ma: Web Image Clus-
tering algorithms as G-means and X-means, which pro-                tering by Consistent Utilization of Visual Features
vide a number of clusters, are not applicable because we            and Surrounding Texts. SIGMM05, November 6-
need this number for eigenvectors search before cluster-            11, 2005.
ing algorithm starts. There is some research in spectral      [7]   Xiaofei He, Wei-Ying Ma, Hongjiang Zhang: Im-
clustering, where authors propose to use eigengap in or-            ageRank : Spectral Tecnhiques For Structural
der to define a correct number of clusters, so this method          Analysis of Image Database.
can be tested in our framework too.
    Concerning directly clustering algorithm, K-means is
easy-to-implement classical method which can provide
a good clustering scheme. In our case its main disad-
vantage is the assumption that image clusters are crisp,
but it is easy to see that in most cases image clusters are
overlaping. Therefore we suppose that fuzzy clustering
methods(for example fuzzy K-means) can provide more
realevant result in image clustering than crisp ones and
we will check this hypothesis.
    Also our framework should be extended for using
multiple feature sets: color histograms and texture fea-
tures simultaneously, because only color data is insuf-
ficient for retrieval in collections of heterogeneous im-
ages. Actually, in this case we have two important prob-
lems. First of them is definition a set of features which
should be used for clustering. A wide variety of color,
texture and shape features have been proposed to rep-
resent image content, so selecting the most meaningful
ones is really difficult. The second problem is a normal-
ization problem. If we combine all our image features
values into one image representation vector, this vector
should be normalized. Suitable normalization method is
needed.

5     Conclusion
This research is conducted under the supervision of Boris
Novikov. The contribution of this paper is a model of
new co-clustering and ranking scheme. This scheme
combines different information retrieval technics, such
as spectral clustering, ranking and data fusion methods,
in order to improve image retrieval effectiveness. We
suppose that our search results representation should be
realy convinient and helpful. We plan to include the
developed framework into CBIR system Foto Finder,
which is being created in our research group.

References
[1]   Reginald E. Hammah and John H. Curran: A Tu-
      torial on Spectral Clustering. Technical Report No.
      TR-149, August 2006
[2]   Ilya Markov, Natalia Vassilieva: Building Up Low-
      Level Centroids for Groups of Perceptually Similar
      Images.