=Paper=
{{Paper
|id=Vol-515/paper-7
|storemode=property
|title=A Fair Ranking Method for Image Database Retrieval
|pdfUrl=https://ceur-ws.org/Vol-515/livingweb2009_paper7.pdf
|volume=Vol-515
|dblpUrl=https://dblp.org/rec/conf/semweb/CostantiniCCN09
}}
==A Fair Ranking Method for Image Database Retrieval==
A fair ranking method for image database
retrieval
L. Costantinia,b , L. Capodiferroa , M. Carlib , and A. Nerib
a
Fondazione Ugo Bordoni, Roma, Italy
b
Applied Electronics Department, University of Roma TRE, Roma, Italy
Abstract. This work aims at organizing the results of query-by-example
image database management system so that also the less relevant ob-
jects still representing a minority, are presented to the user. Usually,
during image retrieval, the objects very similar to the query are ranked
in the first positions and shown to the user. On the contrary, objects
that slightly differ from the target image, and that are less numerous,
are hardly ever shown to the user. The proposed method increases the
chances for a fair retrieval of multimedia data from digital databases.
Key words: Fair ranking, database extraction, texture analysis.
1 Introduction
Unsupervised or human guided object retrieval is one of the basic functionali-
ties of content-centric Internet services. The first generation of digital database
retrieval systems was built on the use of metadata describing the semantic con-
tent of a multimedia database, usually extracted by manual procedures. How-
ever, Future Internet content aware services require more efficient functionalities
for inspection, crawling, recognition, categorization, and indexing of multimedia
content with minimal human intervention.
The research activity on Content Based Image Retrieval (CBIR) has been
focused on the analysis of local and global image features for the refinement of
the coarse annotation, extracted from the WEB pages containing the image, and
for re-ranking the results obtained by searching those annotations [1], [3], [4],
[5]. In fact, up to the present, traditional search based on keywords has focused
on direct use of image content, because of the difficulty in providing at least a
sketch of the desired picture. Nevertheless, the widespread diffusion of camera-
phones seems to open new opportunities to CBIR for the easiness in providing
samples taken from the real word when browsing an image archive by means of
a cell phone [2].
Current CBIR techniques are based on comparisons of global features like
dominant colors, object shapes, textures as well as of local features relating to
the most salient points, like corners. They evaluate a similarity index computed
on the basis of the features extracted from the reference template and those of the
candidate image. The similarity index can either directly employ these features,
2 L. Costantinia,b , L. Capodiferroa , M. Carlib , and A. Nerib
as in the case of adoption, as index, of the maximum likelihood or of the belief
functional, or can be based on the comparison of the statistical distribution of
the features, as in the case of the use of the Kullback-Leibler divergence. On
the other hand, irrespective of the set of local or global features employed for
comparing image and video contents, current search engines produce a list of
candidates usually sorted with respect to the similarity index. The consequence
is that in the presence of several clusters of candidates, the search engine will
report the elements of the most similar images and, if the number of elements
belonging to that cluster is large enough, the other clusters may be completely
cluttered by the dominant one.
The aim of this contribution is to propose a technique for sorting the list of
potential candidates in such a way that all the relevant clusters are represented in
the list displayed to the user, thus preserving the diversity of possible solutions.
More specifically, a pre-selection of candidates by considering a similarity
index accounting only for a subset of the whole feature set is performed. This
subset can be either predetermined by referring, for instance, to those features
like morphology, that the user regards as more relevant, or automatically selected
as those maximizing like a partial similarity index, as described in the next
section. Then, the candidate set is clustered by extracting the local maxima
of the similarity index evaluated on the full set of features. Finally, for each
local maximum, including the absolute one, a representative set of images to
be presented as query results is selected. The cardinality of each set may be
proportional to the similarity index.
With respect to the evaluation of the partial similarity on the basis of a
predefined feature subset, we observe that the relative importance of each fea-
ture strictly depends on the user needs. Therefore, although default values for
different applications and contests can be specified, the user should have the
possibility to provide this information to the search engine.
In the examples reported in this contribution, it is assumed that morphology
is the most discriminating global feature for the user, while color based features
act as elements discriminating the various clusters. In order to verify the perfor-
mances of the proposed method in highlighting the differences among the search
outcomes, we simplified the set of features employed in the comparison. Thus,
with respect to the MPEG-7 ensemble of global and local features, see [1], we
focused our attention on the texture morphology, as described by the statistics of
the outputs of a set of steerable filters, and on the dominant colors. The simula-
tion results confirm that search results clustering driven by the relative maxima
of the similarity index is a powerful technique for preserving web diversity.
2 The fair similarity ranking
Let us model the global feature space V as an M -dimensional vector space,
partitioned into L subspaces Wj , each one corresponding to a particular feature
set, so that V can be written as direct sum of these subspaces: V = W1 ⊕ W2 ⊕
A fair ranking method for image database retrieval 3
· · ·⊕WM . Given two elements x, y ∈ V , their similarity s(x, y) can be computed:
L
X
s(x, y) = si (xi , yi ), (1)
i=1
where si (xi , yi ) is the similarity index corresponding to the i-th feature set. Let
us consider, without loss of generality, the subset V̂ consisting of N subspaces
corresponding to the first N features.
Let Ck,M be the set of k-combinations of the N -element index set, then for
any k-combination i = (i1 , i2 , . . . , ik−1 , ik ) let us denote with Wi the set obtained
as direct sum of Vi1 , Vi2 , . . . , Vik−1 , Vik , i.e., Wi = Vi1 ⊕ Vi2 ⊕ . . . ⊕ Vik−1 ⊕ Vik .
Then given two elements x, y ∈ V we define as partial similarity s̃k,N (x, y)
based on the best k features out of N as
k
M ax X
s̃k,N (x, y) = sih (xih , yih ). (2)
i ∈ Ck,N
h=1
We observe that the concept of similarity can be easily replaced by the con-
cept of dissimilarity. Consequently, we define as partial dissimilarity d˜k,N (x, y)
based on the best k features out of N as
k
min X ˜
d˜k,N (x, y) = dih (xih , yih ). (3)
i ∈ Ck,N
h=1
The use of dissimilarity in place of similarity is suggested by the fact that
some effective indicators widely used in statistics like the Kullback-Leibler di-
vergence do not satisfy the triangle inequality. Thus, for a given template x,
the set A(x) of candidates can now be computed by thresholding the partial
dissimilarity, namely:
A(x) = {ykd˜k,N (x, y) ≤ γ}. (4)
Finally, clustering of the candidate set A(x) on the basis of the whole feature
space V is performed. The dissimilarity threshold γV employed in the cluster for-
mation is, in general, lower than the threshold γ used in the candidate selection
phase.
3 Laguerre Gauss decomposition and texture
classification
Texture segmentation and classification has been intensively studied and many
texture description algorithms have been proposed including statistical [6] and
feature based methods [7], [8]. In many applications, segmentation and classifi-
cation need to be performed without taking object orientations into account, so
rotation invariant descriptions of texture patterns are employed.
4 L. Costantinia,b , L. Capodiferroa , M. Carlib , and A. Nerib
In this context, a relevant role has been played by the Circular Harmonic
Functions (CHF), initially introduced in the field of optical processing for rota-
tion invariant pattern recognition, and recently casted in a wavelet decomposi-
tion scheme [10], [11]. Since CHFs are intrinsically tuned to image features like
edges, lines, equiangular forks, orthogonal crosses without regard to their orien-
tations, they are good candidates for such applications. A detailed description
of Laguerre Gauss wavelet decomposition can be found in [10].
In this paper, we use the pixel classification algorithm employed by Randen
and Husoy in their feature set comparisons [9] for segmentation. Then we utilize
a multiscale segmentation method, [10], starting from CHF moments. These mo-
ments are obtained by the decomposition of a template on complex CHFs that
form a complete orthogonal basis on a unit disc. Due to a general property of the
CHFs, a pattern can be easily steered by multiplying the expansion coefficients
by complex exponential factors whose phase is proportional to the rotation an-
gle, [12], [13]. As a consequence, rotation invariants can be easily obtained by
considering the magnitude of the expansion coefficients.
In our method, the texture classification is performed by considering both
the structural (morphological) and the color components. To represent a texture
pattern morphology, the expansion coefficients of the Laguerre-Gauss transform
of the luminance component Y corresponding to a finite set of N K order pairs
{(n, k)| n = 1, ..., N, k = 0, ..., K − 1} are employed. In particular n = 1, .., 3
and k = 0, .., 3 at 3 different resolutions have been employed in the examples
reported in the following. Since the magnitude of the transform coefficients is
rotation invariant, we use their statistics. In particular, as already stated in [10]
it is possible to characterize the marginal density of a wavelet decomposition
by using a generalized Gaussian function. This distribution is characterized by
two parameters, α and β, directly related to the mean and variance distribution.
Thus, mean and variance are sufficient to describe the statistical properties of
the Laguerre-Gauss transform coefficient magnitudes of a portion of an image.
As a consequence, for each point of a given region of interest (ROI) we first
compute the KN Laguerre-Gauss transform coefficients. Then, we evaluate the
mean and the variance of their magnitudes inside the ROI, so that to each
pattern a morphology feature vector of length 2N K is associated.
The partial dissimilarity index d˜Y (x, z) associated to the structure of two
textures x, z is then evaluated as the the Kullback-Leibler distance KLD among
the generalized Gaussian probability density functions modeling the statistical
behavior of the wavelet coefficient magnitudes:
h i
s̃Y (x, z) = KLD WL(n) Yx (b, 0, a) , WL(n) Yz (b, 0, a) . (5)
k k
where WL(n) Yx (b, 0, a) represents the Laguerre-Gauss transform at location
k
b, rotation ϕ, and scale a.
In order to evaluate the dissimilarity index related to the chromatic compo-
nents Cb, Cr we characterize them by their mean and their centered moments
A fair ranking method for image database retrieval 5
of the second and third order [2], computed as follows:
Nr XNc
1 X
µ= p (i, j) (6)
Nr Nc i=1 j=1
1/
2
Nr X Nc
1 X 2
σ= (p (i, j) − µ) (7)
Nr Nc i=1 j=1
1/
3
Nr X Nc
1 X 3
t= (p (i, j) − µ) (8)
Nr Nc i=1 j=1
where p represents the pixel chromatic component at location (i, j), Nr and
Nc respectively represent the height and the width of the image. The color
information is therefore represented by a features vector of six elements. In this
case, the color matching is based on the Euclidean distance among vectors.
The partial dissimilarity index based on the chromatic components is therefore
computed as follows
i1/
2
h
2 2 2
d˜Cr (x, z) = [µCr (x) − µCr (z)] + [σCr (x) − σCr (z)] + [tCr (x) − tCr (z)]
(9)
i1/
2
h
2 2 2
d˜Cb (x, z) = [µCb (x) − µCb (z)] + [σCb (x) − σCb (z)] + [tCb (x) − tCb (z)]
(10)
The overall dissimilarity D̃tot between two textures is finally computed as
follows:
h i1/2
D̃tot = d˜Y (x, z) + α d˜Cr (x, z)2 + d˜Cb (x, z)2 (11)
where α is the factor is used to select the relative importance between structural
and chromatic components.
4 Clustering algorithm
The clustering algorithm proposed in the follow aims to realize a data ranking
starting from a query-by-example image where the images are grouped together
in clusters on the basis of a similarity index, while different relevant clusters are
preserved and displayed to the user, thus representing at the same time similarity
and diversity between the data set. The algorithm consists of four main steps:
6 L. Costantinia,b , L. Capodiferroa , M. Carlib , and A. Nerib
1. In the first step the images in the database are ranked according to the
similarity to the query image, on the base of Eq. 11. Both luminance and
color components are used in this phase. The first 32 images are considered
relevant and they will be used in the following steps (see table 4).
2. A new ranking is performed by computing the similarity between the 32
images previously selected with the image ranked the last position of the
previous step. The comparison is made by considering both luminance and
color components. A new cluster is created with the images presenting a
distance value, with respect to the query image, lower than a fixed threshold.
These images are removed from the initial cluster. This step is repeated until
the original cluster is empty. For each cluster a feature vector is computed
by averaging the features of the images belonging to it.
3. The operations performed on the images in the previous step are repeated
on the clusters, by operating on theirs features vectors. This step is iterated
until at least one new cluster is created. If the algorithm does not create any
new cluster, the threshold is increased and the step is repeated again. This
step ends if the threshold becomes higher than a fixed maximum value or
the number of cluster is less than a minimum value (6 in our experiments).
4. By rearranging the clusters created at the previous step considering the
number of elements it is possible to obtain the final classification, table 4.
The image which represents the cluster can be the first one in the cluster or
the image with the most similar representative vector to the representative
vector of the cluster.
In our experiments, the starting value of the threshold in the step 3 is the same of
the threshold used in the step 2. The optimal value is automatically determined
during the process by histogram evaluation.
5 Experimental results
The performance of the proposed method are evaluated by using a database
composed by 640 images. In the database there are 40 classes of images, each
class is composed by 16 images. We analyze the results obtained using each image
in the database as query image. The experimental results are shown in table 5. To
a better understanding of table 5 it is important to clarify the meaning of outsider
and impure cluster. An outsider is an image that the algorithm puts in a wrong
cluster, instead an impure cluster is a cluster which has at least one outsider.
The results show that the algorithm creates a right number of cluster, indeed
the average number of clusters created is 5.46 and the true average number of
clusters is 4.67, furthermore the percentage of impure clusters is very low, only
9.12%.
6 Conclusions
In this contribution a novel method for a fair organization of the results of
a query-by-example retrieval system is presented. The system is able to show,
A fair ranking method for image database retrieval 7
Ranking Image Ranking Image
1 Bark.0000.00 17 Bark.0000.02
2 Bark.0000.08 18 Brick.0005.14
3 Bark.0000.09 19 Bark.0000.11
4 Bark.0000.05 20 Wood.0001.01
5 Bark.0000.12 21 Wood.0001.05
6 Bark.0000.01 22 Flowers.0005.13
7 Bark.0000.04 23 Brick.0005.02
8 Bark.0000.13 24 Leaves.00016.15
9 Bark.0000.06 25 Flowers.0005.12
10 Wood.0001.15 25 Wood.0001.09
11 Bark.0000.10 27 Leaves.0016.09
12 Wood.0001.13 28 Brick.0005.01
13 Wood.0001.02 29 Flowers.0005.09
14 Bark.0000.07 30 Leaves.00016.14
15 Wood.0001.12 31 Leaves.00016.13
16 Wood.0001.08 32 Brick.0005.00
Table 1. Ranking of the first 32 images starting from the query image Bark.0000.00
of the ”Vision Texture” database used for the tests.
Ranking Representant of cluster Ranking Representant of cluster
1 Bark.0000.05 5 Leaves.0016.13
2 Wood.0001.01 6 Flowers.0005.09
3 Bark.0000.11 7 Wood.0001.13
4 Brick.0005.14 8 Wood.0001.12
Table 2. Result of the clustering algorithm obtained by using the image Bark.0000.00.
Total number of cluster created 3496
Total number of impure cluster created 319
Total number of outsiders 974
Average percentage of impure cluster 9.12%
Average of outsiders in each impure cluster 3.05
Average number of cluster created 5.46
True average number of cluster 4.67
Table 3. Performance results.
8 L. Costantinia,b , L. Capodiferroa , M. Carlib , and A. Nerib
together with the most similar match, also some less similar image cluster. In
this way, not only objects very similar to the query are shown to the user. On
the contrary, objects that slightly differ from the target image, and that are less
numerous, are hardly ever shown to the user. The proposed method increases
the chances that all the relevant clusters are represented in the list displayed to
the user, thus preserving the diversity of possible solutions.
References
1. Weiguang Xu, Yafei Zhang, Jianjiang Lu, Ran Li, Zhenghui Xie, ”A Framework of
Web Image Search Engine,” JCAI, pp.522-525, 2009 Int.Joint Conf. on Artificial
Intelligence, 2009.
2. Ruhan He, Kaiming Liu, Naixue Xiong, Yong Zhu, ”Garment Image Retrieval on the
Web with Ubiquitous Camera-Phone,” APSCC, pp.1584-1589, IEEE Asia-Pacific
Services Computing Conf., 2008
3. Ottavio Augusto Bizetto Penatti, Ricardo da Silva Torres, ”Color Descriptors for
Web Image Retrieval: A Comparative Study,” SIBGRAPI, pp.163-170, XXI Brazil-
ian Symp. on Computer Graphics and Image Processing, 2008.
4. Yong Zhu, Naixue Xiong, Jong Hyuk Park, Ruhan He, ”A Web Image Retrieval
Re-ranking Scheme with Cross-Modal Association Rules,” pp.83-86, Int. Symp. on
Ubiquitous Multimedia Computing, 2008
5. Lin Xie, Yao Zhao, Zhenfeng Zhu, ”A Unified System for Web Personal Image
Retrieval,” IIH-MSP, pp.787-790, Int. Conf. on Intelligent Information Hiding and
Multimedia Signal Processing, 2008
6. T. Quack, U. Mönich, L. Thiele, and B.S. Manjunath, ”Cortina: a system for large-
scale, content-based web image retrieval,” MULTIMEDIA ’04: Proceedings of the
12th annual ACM Int. Conf. on Multimedia, pp. 508-511, New York, NY, USA,
2004.
7. K. Stevenson, C. Leung, ”Comparative evaluation of Web image search engines for
multimedia applications,” ICME, IEEE Int. Conf. on Multimedia and Expo, 2005
8. Minh N. Do, Martin Vetterli, ”Wavelet-Based Texture Retrieval Using Generalized
Gaussian Density and Kullback-Leibler Distance”, IEEE Trans. on Image Process-
ing, Vol. 11, No. 2, February 2002
9. Akira Yanagawa, Winston Hsu, Shih-Fu Chang, ”Brief Descriptions of Visual Fea-
tures for Baseline TRECVID Concept Detectors”, ADVENT Technical Report 219-
2006-5 Columbia University, July 2006
10. A. Neri, G. Jacovitti, ”Maximum Likelhood Localization of 2-D Patterns in the
Gauss-Laguerre Transform Doman: Theoretic Framework and Preliminary Results”
in IEEE Trans. on Image Processing, Vol. 13, No. 1, January 2004.
11. G. Jacovitti, A. Neri, ”Multiresolution Circular Harmonic Decomposition”, in
IEEE Trans. on Image Processing, pp. 3242-3247, Vol.48, n. 11, November 2000.
12. J. Rosen, J. Shamir, “Circular Harmonic Phase Filters for Efficient Rotation -
Invariant Pattern Recognition”, Applied Optics, Vol.27, No.14, pp.2895-2899, July
1988.
13. M.Carli, G. Jacovitti, A. Neri, “Generalized Maximum Likelihood Test for Rota-
tion Invariant Pattern Recognition”, SPIE Conf. Photonics East, Boston, November
2000.