=Paper= {{Paper |id=Vol-1173/CLEF2007wn-ImageCLEF-TorjmenEt2007 |storemode=property |title=Using Pseudo-relevance Feedback to Improve Image Retrieval Results |pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-TorjmenEt2007.pdf |volume=Vol-1173 |dblpUrl=https://dblp.org/rec/conf/clef/KhemakhemPB07 }} ==Using Pseudo-relevance Feedback to Improve Image Retrieval Results== https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-TorjmenEt2007.pdf

Using pseudo-relevance feedback to improve
image retrieval results
Mouna Torjmen, Karen Pinel-Sauvagnat, Mohand Boughanem
IRIT, 118 Route Narbonne-31062 Toulouse Cedex 4 -France
torjmen, sauvagna, bougha@irit.fr

Abstract
In this paper, we propose a pseudo-relevance feedback method to deal with the pho-
tographic retrieval and medical retrieval tasks of ImageCLEF 2007. The aim of our
participation to ImageCLEF is to evaluate a combination method using both english
textual queries and image queries to answer to topics. The approach processes image
queries and merges them with textual queries in order to improve results.
We do not obtain good results using only textual information and queries. To process
image queries, we used the Fire system to sort similar images using low level fea-
tures, and we then used associated textual information of the top images to construct
a new textual query. Results showed the interest of low level features to process im-
age queries, as performance increased compared to textual queries processing. Finally,
best results were obtained combining the results lists of textual queries processing and
image queries processing with a linear function .

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database
Managment]: Languages—Query Languages

Keywords
Image retrieval, pseudo-relevance feedback

1 Introduction
In Image Retrieval, one can distinguish two main approaches [16] : (1) Context Based Image
Retrieval and (2) Content Based Image Retrieval:

1. The context of an image is all information about the image coming from others sources than
the image itself. For the time being, only textual information is used as context. The main
problem of this approach is that documents can use different words to describe the same
image or can use the same words to describe different concepts. Moreover image queries
can’t be processed.

2. Content Based Image Retrieval (CBIR) systems use low-level image features to return images
similar to an example image. The main problem of this approach is that visual similarity
does not always correspond to semantic similarity (for example a CBIR system can return
a picture of blue sky when the example image is a blue car).
Most of the image retrieval systems combine nowadays content and context retrieval, in order
to take advantages of both methods. Indeed, it has been proved that combining text- and content-
based methods for images retrieval always improves performance [4].

Images and textual information can be considered as independent and content and contextual
information of queries can be combined in different ways:

• Image queries and textual queries can be processed separately and the two results lists are
then merged using a linear function [1], [7].

• One can also use a pipeline approach: a first search is done using textual information or
content information, and a filtering step is then processed using the other information type
to exclude non-relevant images [12].

• Other methods use Latent Semantic Analysis (LSA) techniques to combine visual and textual
information, but are not efficient [16] [17].

Some other works propose translation-based methods, in which content and context information
are complementary. The main idea is to extract relations between images and text, and to use
them to translate textual information to visual one and vice versa [9]:

• In [8], authors translate textual queries to visual ones.

• authors of [2] propose to translate image queries to textual ones, and to process them using
textual methods. Results are then merged with those obtained with textual queries. Authors
in [10] also propose to expand the initial textual query by terms extracted thanks to an image
query.

For the latter methods, the main problem to construct a new textual query or expand an initial
textual query is term extraction. To do this, the main solution is pseudo-relevance feedback. Using
pseudo-relevance feedback in context based image retrieval to process image queries is slightly
different from classic pseudo-relevance feedback. The first step is to use a visual system to process
image queries. Images obtained as results are considered as relevant and the associated textual
information is then used to select terms in order to express a new textual query.
The work presented in this paper also propose to combine context and content information
to answer to the photographic retrieval and medical retrieval tasks. More precisely, we present a
method to transform image queries to textual ones. We use XFIRM [14], a structured information
retrieval system to process english textual queries, and the Fire system [3] to process image queries.
Documents corresponding to the images returned by Fire are used to extract terms that will form
a new textual query.
The paper is organized as follows. In Section 2, we describe textual queries processing using
the XFIRM system. In Section 3, we describe the image queries processing using in a first step, the
Fire system, and in a second step a pseudo-relevance feedback method. In Section 4, we present
our combination method, which uses both results of the XFIRM and FIRE systems. Experiments
and results for the two tasks (medical retrieval and photographic retrieval [13], [6]) are exposed in
section 5. Finally we conclude in Section 6 .

2 Textual queries processing
Textual information of collections used for the photographic and medical retrieval tasks [6] is
organised using the XML language. In the indexing phase, we decided to only use documents ele-
ments containing positive information: ≺ description , ≺ title , ≺ notes and ≺ location .
We then used the XFIRM system [14] to process queries. XFIRM (XML Flexible Information Re-
trieval Model ) uses a relevance propagation method to process textual queries in XML documents.
Relevance values are first computed on leaf nodes (which contain textual information) and scores
are then propagated along the document tree to evaluate inner nodes relevance values.

Let q = t1 , . . . , tn be a textual query composed of n terms. Relevance values of leaf nodes ln
are computed thanks to a similarity function RSV (q, ln).

X
n
RSV (q, ln) = wiq ∗ wiln , where wiq = tfiq and wiln = tfiln ∗ idfi ∗ iefi (1)
i=1

wiq and wiln are the weights of term i in query q and leaf node ln respectively. tfiq and tfiln are the
frequency of i in q and ln, idfi = log(|D|/(|di| + 1)) + 1, with |D| the total number of documents
in the collection, and |di| the number of documents containing i, and iefi is the inverse element
frequency of term i, i.e. log(|N |/|nfi | + 1) + 1, where |nfi | is the number of leaf nodes containing
i and |N | is the total number of leaf nodes in the collection.
idfi allows to model the importance of term i in the collection of documents, while ief i allows to
model it in the collection of elements.

Each node n in the document tree is then assigned a relevance score rn which is function of the
relevance scores of the leaf nodes it contains and of the relevance value of the whole document.
X
rn = ρ ∗ |Lrn |. αdist(n,lnk )−1 ∗ RSV (q, lnk ) + (1 − ρ) ∗ rroot (2)
lnk ∈Ln

dist(n, lnk ) is the distance between node n and leaf node lnk in the document tree, i.e. the number
of arcs that are necessary to join n and lnk , and α ∈]0..1] allows to adapt the importance of the
dist parameter. In all the experiments presented in the paper, α is set to 0.6.
Ln is the set of leaf nodes being descendant of n, and |Lrn | is the number of leaf nodes in Ln having
a non-zero relevance value (according to equation 1). ρ ∈]0..1], inspired from work presented in
[11], allows the introduction of document relevance in inner nodes relevance evaluation, and r root
is the relevance score of the root element, i.e. the relevance score of the whole document, evaluated
with equation 2 with ρ = 1.

Finally, the documents dj containing relevant nodes are retrieved with the following relevance
score:
rxf irm (dj ) = maxn∈dj rn (3)
Images associated to the documents are lastly returned by the system to answer to the retrieval
tasks.

3 Image queries processing
To process image queries, we used a third-steps method: (1) a first step processes images using
the Fire System [3], (2) we then use pseudo-relevance feedback to construct new textual queries ,
(3) the new textual queries are processed with the XFIRM system.

We first used the Fire system to get the top K similar images to the image query. We then get
the N associated textual documents (with N ≤ K, because some images do not have associated
textual information) and extracted the top L terms from them. To select the top L terms, we
evaluated two formula to express the weight wi of term ti .

The first formula uses the frequency of term ti in the N documents.

X
N
wi = tfij (4)
j=1
where tfij is the frequency of term ti in document dj .
The second formula uses terms frequency in the N selected documents, the number of doc-
uments in the N selected containing the term, and a normalized idf of the term in the whole
collection.
XN D
ni log( di )
wi = [1 + log( tfij )] ∗ ∗ (5)
j=1
N log(D)

where ni is the number of documents in the N associated documents containing the term t i , D is
the number of all documents in the collection and di is the number of documents in the collection
containing ti .
The use of the nNi parameter is based on the following idea: a term occuring one time in n
documents is more important and mustP be more relevant than a term occuring n times in one
N
document. The log function is used on j=1 tfij because without it results with or without the
ni
N parameter were almost the same.
We then construct a new textual query with the top L terms selected according to formula 4
or 5 and we process it using the XFIRM system (as explained in section 2).

In the photographic retrieval task, we obtained the following queries for topic Q48, with K = 5
and L <= 5:
Textual query using equation 4: ”south korea river”
Textual query using equation 5: ”south korea night forklift australia”
The original textual query in english was: ”vehicle in South Korea”. As we can see, the query
using equation 5 is more similar to the original query than the one using equation 4.

4 Combination function
To evaluate the interest of using both content and context information, we combined results of
image queries and textual queries processing and we evaluated new relevance scores r(d j ) for
documents dj :

r(dj ) = λ ∗ (rxf irm (dj )) + (1 − λ) ∗ (rP RF (dj )) (6)
where rxf irm (dj ) is the relevance score of document dj according to the XFIRM system (equation
3) and rP RF (dj ) is the relevance score of dj according to the XFIRM system after image queries
processing (see section 3).

In order to answer to both retrieval tasks, we then return all images associated to the top
ranked documents.
Figure 1 illustrates our approach.
Image query
Top K New textual
Fire System images
query (L terms)

XML associated text
XFIRM System

XML text
images

Documents and
their associated
Images results relevance scores
Whole collection

on
ati
bin
m n
co tio
ar unc
Textual query

e f
Lin
XFIRM System Documents and
their associated
relevance scores
Final
documents
results
Images
associated
to documents

Final
images
results

Figure 1: Query processing with the combinate approach

5 Evaluation and results
5.1 Photographic Retrieval Task
5.1.1 Evaluation of textual queries
We evaluated english textual queries using the XFIRM system with parameters ρ = 0.9 and ρ = 1.
Results, which are almost the same, are presented in table 1.

Run-id ρ MAP P10 P20 P30 BPREF GMAP
RunText0609 0.9 0.0634 0.1400 0.1175 0.1133 0.0719 0.0039
RunText061 1 0.0633 0.1400 0.1175 0.1128 0.0719 0.0039

Table 1: Textual queries results using the XFIRM system

5.1.2 Evaluation of image queries
Table 2 shows results using the two formula described in section 3.
Run-id K L ρ Eq. MAP P10 P20 P30 BPREF GMAP
RunPRF061tf 6 5 1 eq. 4 0.0634 0.1400 0.1175 0.1133 0.0719 0.0039
RunPRF061tfnNidf 6 15 1 eq. 5 0.1231 0.2100 0.2000 0.1794 0.1384 0.0065
RunPRF0609tfnNidf 6 15 0.9 eq. 5 0.1252 0.2117 0.2000 0.1794 0.1389 0.0067

Table 2: Image queries results using pseudo-relevance feedback with the FIRE and XFIRM systems

We notice that the use of term frequency in selected documents is not enough, and that the
importance of the term in the collection need to be used in the term weighted function (results
are better with equation 5 than with equation 4).
If we now compare table 1 and table 2, we see that processing image queries with the Fire
system and our pseudo-relevance feedback system gives better results than using only the XFIRM
system on textual queries. It shows the importance of visual features to retrieve images.

5.1.3 Combination of textual and image queries results
Table 3 shows our results for the combination approach.

Run-id K λ ρ Eq. MAP P10 P20 P30 BPREF GMAP
Runcomb1 6 0.9 1 eq. 4 0.1039 0.1500 0.1242 0.1189 0.0915 0.0311
Runcomb2 15 0.9 0.9 eq. 5 0.1091 0.1433 0.1292 0.1267 0.0969 0.0291
Runcomb3 15 0.5 1 eq. 5 0.1354 0.2217 0.1983 0.1839 0.1402 0.0351
Runcomb4 15 0.9 1 eq. 5 0.1308 0.2100 0.1983 0.1867 0.1454 0.0264

Table 3: Results using the combination function

Let us first compare runs Runcomb1 and Runcomb4, which use eq. 4 and K=6, and eq. 5 and
K=15. For both, we use ρ = 1, L=5 and λ = 0.9 for the combination. Results show that using eq.
5 with K=15 is more efficient that eq. 4 with K=6, which confirms results obtained using only
image queries..
In order to evaluate the combination function, we then use eq. 5, and fix ρ = 1, K=15 and
L=5. We test λ = 0.5 and λ = 0.9 (runs Runcomb3 and Runcomb4). Results are almost the same
but combining equally the two sources of evidence gives slightly better results.
Finally, we vary ρ = 0, 9 and ρ = 1, and fix equation 5, λ = 0.9 in equation 6, K =15, L=5
(runs Runcomb4 and Runcomb2). Better results are obtained with ρ = 1, which means that the
document relevance should not be taken into account in the evaluation of inner nodes relevance
values (equation 2).

5.2 Medical Retrieval Task
For this task, we only evaluated the combination method described in section 4. RunComb09 uses
equation 5 with ρ = 1, K=15, L=10 and λ = 0.9.
RunComb05 uses equation 4 with ρ=1, K=6, L=5 and λ = 0.5.

Run-id Eq. L K λ MAP R-prec P10 P30 P100
RunComb09 eq. 5 10 15 0.9 0.1297 0.1687 0.2100 0.2122 0.1893
RunComb05 eq. 4 5 6 0.5 0.066 0.0.0996 0.0833 0.11 0.1023

Table 4: Results of the Medical retrieval task

Results are significantly better for run RunComb09. However, as many parameters are involved
(K, L, λ and the equation used to select terms) it is difficult to conclude on which parameters
impact the results. Further experiments are thus needed.
6 Discussion
Increasing the number of textual information resources to construct new textual queries from im-
age queries improves results: the K number of selected images from FIRE results has a great
impact on results. Increasing K improves thus results by introducing relevant information.
Another factor of influence on results is the number of new query terms L. In our experiments,
when K and L increase, the MAP metric also increases.
Moreover, processing textual queries or images separately does not allow to obtain the best results:
combining the two sources of evidence clearly improves results.

Finally, we’d like to conclude with the type of textual information used. In the Medical and
Photographic Retrieval Tasks, textual information is encoded using the XML language, and as a
consequence, we decided to use an XML-oriented information retrieval system to process textual
queries (XFIRM). However, elements are not organized in a hierarchic way as in can be the case
in XML documents (no ancestor-descendant relationships between nodes), and the functions used
by the XFIRM system to evaluate nodes relevance may not be appropriate in that case. Other
experiments are consequently needed with a plain-text information retrieval system. Combining
the XFIRM system with the FIRE system may be however interesting with fully encoded-XML
collections.

7 Conclusion and future work
We participated in the Photographic and Medical Retrieval Tasks of ImageCLEF 2007 in order to
evaluate a method using a content- and context-based approach to answer to topics. We proposed a
new pseudo-relevance feedback approach to process image queries and we tested an XML oriented
system to process textual queries. Results showed the interest of combining the two sources of
evidence (content and context) to answer to image retrieval.
In future work, we plan to:

• Add low level features results extracted from FIRE to the combination function in the
Medical Retrieval Task, as visual features are very important in the medical domain.

• Sort images using concepts level features [15] instead of low level features to construct new
textual queries in the Photographic Retrieval Task.

• Use a specific domain ontology to expand textual queries (original textual queries and queries
obtained with our pseudo-relevance feedback approach).

References
[1] Susanne Boll, Wolfgang Klas, and Jochen Wandel. A cross-media adaptation strategy for
multimedia presentations. In ACM Multimedia (1), pages 37–46, 1999.

[2] Yih-Chen Chang, Wen-Cheng Lin, and Hsin-Hsi Chen. A corpus-based relevance feedback
approach to cross-language image retrieval. In CLEF, pages 592–601, 2005.

[3] T. Deselaers, D. Keysers, and H. Ney. FIRE — flexible image retrieval engine: ImageCLEF
2004 evaluation. In CLEF Workshop (2004), 2004.

[4] Thomas Deselaers, Henning Mller, Paul Clogh, Hermann Ney, and Thomas M Lehmann.
The clef 2005 automatic medical image annotation task. International Journal of Computer
Vision, 74(1):51–58, August 2007.

[5] N. Fuhr, Mounia Lalmas, S. Malik, and G. Kazai. INEX 2005 workshop proceedings, 2005.
[6] Michael Grubinger, Paul Clough, Allan Hanbury, and Henning Müller. Overview of the
ImageCLEF 2007 photographic retrieval task. In Working Notes of the 2007 CLEF Workshop,
Budapest, Hungary, September 2007.

[7] Gareth J. F. Jones, Michael Burke, John Judge, Anna Khasin, Adenike M. Lam-Adesina, and
Joachim Wagner. Dublin city university at clef 2004: Experiments in monolingual, bilingual
and multilingual retrieval. In CLEF, pages 207–220, 2004.

[8] Wen-Cheng Lin, Yih-Chen Chang, and Hsin-Hsi Chen. Integrating textual and visual in-
formation for cross-language image retrieval. In Proceedings of the Second Asia Information
Retrieval Symposium, pages 454–466, 2005.

[9] Wen-Cheng Lin, Yih-Chen Chang, and Hsin-Hsi Chen. Integrating textual and visual infor-
mation for cross-language image retrieval: A trans-media dictionary approach. Inf. Process.
Manage., 43(2):488–502, 2007.

[10] Nicolas Maillot, Jean-Pierre Chevallet, Vlad Valea, and Joo Hwee Lim. Ipal inter-media
pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In Working Notes for
the CLEF 2006 Workshop, 20-22 September , Alicante, Spain, 2006.

[11] Yosi Mass and Matan Mandelbrod. Experimenting various user models for XML retrieval. In
[5], 2005.

[12] Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and
vector quantizing images with words, 1999.

[13] Henning Müller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer, Thomas M.
Deserno, Paul Clough, and William Hersh. Overview of the ImageCLEFmed 2007 medical
retrieval and annotation tasks. In Working Notes of the 2007 CLEF Workshop, Budapest,
Hungary, September 2007.

[14] Karen Sauvagnat. Modle flexible pour la recherche d’information dans des corpus de docu-
ments semi-structurs. PhD thesis, Toulouse : Paul Sabatier University, 2005.

[15] Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold
W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts
in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international
conference on Multimedia, pages 421–430, New York, NY, USA, 2006. ACM Press.

[16] Thijs Westerveld. Image retrieval: Content versus context. In Content-Based Multimedia
Information Access, RIAO 2000 Conference Proceedings, pages 276–284, April 2000.

[17] R. Zhao and W. Grosky. Narrowing the semantic gap - improved text-based web document
retrieval using visual features, 2002.