=Paper=
{{Paper
|id=Vol-1173/CLEF2007wn-ImageCLEF-TorjmenEt2007
|storemode=property
|title=Using Pseudo-relevance Feedback to Improve Image Retrieval Results
|pdfUrl=https://ceur-ws.org/Vol-1173/CLEF2007wn-ImageCLEF-TorjmenEt2007.pdf
|volume=Vol-1173
|dblpUrl=https://dblp.org/rec/conf/clef/KhemakhemPB07
}}
==Using Pseudo-relevance Feedback to Improve Image Retrieval Results==
Using pseudo-relevance feedback to improve image retrieval results Mouna Torjmen, Karen Pinel-Sauvagnat, Mohand Boughanem IRIT, 118 Route Narbonne-31062 Toulouse Cedex 4 -France torjmen, sauvagna, bougha@irit.fr Abstract In this paper, we propose a pseudo-relevance feedback method to deal with the pho- tographic retrieval and medical retrieval tasks of ImageCLEF 2007. The aim of our participation to ImageCLEF is to evaluate a combination method using both english textual queries and image queries to answer to topics. The approach processes image queries and merges them with textual queries in order to improve results. We do not obtain good results using only textual information and queries. To process image queries, we used the Fire system to sort similar images using low level fea- tures, and we then used associated textual information of the top images to construct a new textual query. Results showed the interest of low level features to process im- age queries, as performance increased compared to textual queries processing. Finally, best results were obtained combining the results lists of textual queries processing and image queries processing with a linear function . Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor- mation Search and Retrieval; H.3.4 Systems and Software; H.3.7 Digital Libraries; H.2.3 [Database Managment]: Languages—Query Languages Keywords Image retrieval, pseudo-relevance feedback 1 Introduction In Image Retrieval, one can distinguish two main approaches [16] : (1) Context Based Image Retrieval and (2) Content Based Image Retrieval: 1. The context of an image is all information about the image coming from others sources than the image itself. For the time being, only textual information is used as context. The main problem of this approach is that documents can use different words to describe the same image or can use the same words to describe different concepts. Moreover image queries can’t be processed. 2. Content Based Image Retrieval (CBIR) systems use low-level image features to return images similar to an example image. The main problem of this approach is that visual similarity does not always correspond to semantic similarity (for example a CBIR system can return a picture of blue sky when the example image is a blue car). Most of the image retrieval systems combine nowadays content and context retrieval, in order to take advantages of both methods. Indeed, it has been proved that combining text- and content- based methods for images retrieval always improves performance [4]. Images and textual information can be considered as independent and content and contextual information of queries can be combined in different ways: • Image queries and textual queries can be processed separately and the two results lists are then merged using a linear function [1], [7]. • One can also use a pipeline approach: a first search is done using textual information or content information, and a filtering step is then processed using the other information type to exclude non-relevant images [12]. • Other methods use Latent Semantic Analysis (LSA) techniques to combine visual and textual information, but are not efficient [16] [17]. Some other works propose translation-based methods, in which content and context information are complementary. The main idea is to extract relations between images and text, and to use them to translate textual information to visual one and vice versa [9]: • In [8], authors translate textual queries to visual ones. • authors of [2] propose to translate image queries to textual ones, and to process them using textual methods. Results are then merged with those obtained with textual queries. Authors in [10] also propose to expand the initial textual query by terms extracted thanks to an image query. For the latter methods, the main problem to construct a new textual query or expand an initial textual query is term extraction. To do this, the main solution is pseudo-relevance feedback. Using pseudo-relevance feedback in context based image retrieval to process image queries is slightly different from classic pseudo-relevance feedback. The first step is to use a visual system to process image queries. Images obtained as results are considered as relevant and the associated textual information is then used to select terms in order to express a new textual query. The work presented in this paper also propose to combine context and content information to answer to the photographic retrieval and medical retrieval tasks. More precisely, we present a method to transform image queries to textual ones. We use XFIRM [14], a structured information retrieval system to process english textual queries, and the Fire system [3] to process image queries. Documents corresponding to the images returned by Fire are used to extract terms that will form a new textual query. The paper is organized as follows. In Section 2, we describe textual queries processing using the XFIRM system. In Section 3, we describe the image queries processing using in a first step, the Fire system, and in a second step a pseudo-relevance feedback method. In Section 4, we present our combination method, which uses both results of the XFIRM and FIRE systems. Experiments and results for the two tasks (medical retrieval and photographic retrieval [13], [6]) are exposed in section 5. Finally we conclude in Section 6 . 2 Textual queries processing Textual information of collections used for the photographic and medical retrieval tasks [6] is organised using the XML language. In the indexing phase, we decided to only use documents ele- ments containing positive information: ≺ description , ≺ title , ≺ notes and ≺ location . We then used the XFIRM system [14] to process queries. XFIRM (XML Flexible Information Re- trieval Model ) uses a relevance propagation method to process textual queries in XML documents. Relevance values are first computed on leaf nodes (which contain textual information) and scores are then propagated along the document tree to evaluate inner nodes relevance values. Let q = t1 , . . . , tn be a textual query composed of n terms. Relevance values of leaf nodes ln are computed thanks to a similarity function RSV (q, ln). X n RSV (q, ln) = wiq ∗ wiln , where wiq = tfiq and wiln = tfiln ∗ idfi ∗ iefi (1) i=1 wiq and wiln are the weights of term i in query q and leaf node ln respectively. tfiq and tfiln are the frequency of i in q and ln, idfi = log(|D|/(|di| + 1)) + 1, with |D| the total number of documents in the collection, and |di| the number of documents containing i, and iefi is the inverse element frequency of term i, i.e. log(|N |/|nfi | + 1) + 1, where |nfi | is the number of leaf nodes containing i and |N | is the total number of leaf nodes in the collection. idfi allows to model the importance of term i in the collection of documents, while ief i allows to model it in the collection of elements. Each node n in the document tree is then assigned a relevance score rn which is function of the relevance scores of the leaf nodes it contains and of the relevance value of the whole document. X rn = ρ ∗ |Lrn |. αdist(n,lnk )−1 ∗ RSV (q, lnk ) + (1 − ρ) ∗ rroot (2) lnk ∈Ln dist(n, lnk ) is the distance between node n and leaf node lnk in the document tree, i.e. the number of arcs that are necessary to join n and lnk , and α ∈]0..1] allows to adapt the importance of the dist parameter. In all the experiments presented in the paper, α is set to 0.6. Ln is the set of leaf nodes being descendant of n, and |Lrn | is the number of leaf nodes in Ln having a non-zero relevance value (according to equation 1). ρ ∈]0..1], inspired from work presented in [11], allows the introduction of document relevance in inner nodes relevance evaluation, and r root is the relevance score of the root element, i.e. the relevance score of the whole document, evaluated with equation 2 with ρ = 1. Finally, the documents dj containing relevant nodes are retrieved with the following relevance score: rxf irm (dj ) = maxn∈dj rn (3) Images associated to the documents are lastly returned by the system to answer to the retrieval tasks. 3 Image queries processing To process image queries, we used a third-steps method: (1) a first step processes images using the Fire System [3], (2) we then use pseudo-relevance feedback to construct new textual queries , (3) the new textual queries are processed with the XFIRM system. We first used the Fire system to get the top K similar images to the image query. We then get the N associated textual documents (with N ≤ K, because some images do not have associated textual information) and extracted the top L terms from them. To select the top L terms, we evaluated two formula to express the weight wi of term ti . The first formula uses the frequency of term ti in the N documents. X N wi = tfij (4) j=1 where tfij is the frequency of term ti in document dj . The second formula uses terms frequency in the N selected documents, the number of doc- uments in the N selected containing the term, and a normalized idf of the term in the whole collection. XN D ni log( di ) wi = [1 + log( tfij )] ∗ ∗ (5) j=1 N log(D) where ni is the number of documents in the N associated documents containing the term t i , D is the number of all documents in the collection and di is the number of documents in the collection containing ti . The use of the nNi parameter is based on the following idea: a term occuring one time in n documents is more important and mustP be more relevant than a term occuring n times in one N document. The log function is used on j=1 tfij because without it results with or without the ni N parameter were almost the same. We then construct a new textual query with the top L terms selected according to formula 4 or 5 and we process it using the XFIRM system (as explained in section 2). In the photographic retrieval task, we obtained the following queries for topic Q48, with K = 5 and L <= 5: Textual query using equation 4: ”south korea river” Textual query using equation 5: ”south korea night forklift australia” The original textual query in english was: ”vehicle in South Korea”. As we can see, the query using equation 5 is more similar to the original query than the one using equation 4. 4 Combination function To evaluate the interest of using both content and context information, we combined results of image queries and textual queries processing and we evaluated new relevance scores r(d j ) for documents dj : r(dj ) = λ ∗ (rxf irm (dj )) + (1 − λ) ∗ (rP RF (dj )) (6) where rxf irm (dj ) is the relevance score of document dj according to the XFIRM system (equation 3) and rP RF (dj ) is the relevance score of dj according to the XFIRM system after image queries processing (see section 3). In order to answer to both retrieval tasks, we then return all images associated to the top ranked documents. Figure 1 illustrates our approach. Image query Top K New textual Fire System images query (L terms) XML associated text XFIRM System XML text images Documents and their associated Images results relevance scores Whole collection on ati bin m n co tio ar unc Textual query e f Lin XFIRM System Documents and their associated relevance scores Final documents results Images associated to documents Final images results Figure 1: Query processing with the combinate approach 5 Evaluation and results 5.1 Photographic Retrieval Task 5.1.1 Evaluation of textual queries We evaluated english textual queries using the XFIRM system with parameters ρ = 0.9 and ρ = 1. Results, which are almost the same, are presented in table 1. Run-id ρ MAP P10 P20 P30 BPREF GMAP RunText0609 0.9 0.0634 0.1400 0.1175 0.1133 0.0719 0.0039 RunText061 1 0.0633 0.1400 0.1175 0.1128 0.0719 0.0039 Table 1: Textual queries results using the XFIRM system 5.1.2 Evaluation of image queries Table 2 shows results using the two formula described in section 3. Run-id K L ρ Eq. MAP P10 P20 P30 BPREF GMAP RunPRF061tf 6 5 1 eq. 4 0.0634 0.1400 0.1175 0.1133 0.0719 0.0039 RunPRF061tfnNidf 6 15 1 eq. 5 0.1231 0.2100 0.2000 0.1794 0.1384 0.0065 RunPRF0609tfnNidf 6 15 0.9 eq. 5 0.1252 0.2117 0.2000 0.1794 0.1389 0.0067 Table 2: Image queries results using pseudo-relevance feedback with the FIRE and XFIRM systems We notice that the use of term frequency in selected documents is not enough, and that the importance of the term in the collection need to be used in the term weighted function (results are better with equation 5 than with equation 4). If we now compare table 1 and table 2, we see that processing image queries with the Fire system and our pseudo-relevance feedback system gives better results than using only the XFIRM system on textual queries. It shows the importance of visual features to retrieve images. 5.1.3 Combination of textual and image queries results Table 3 shows our results for the combination approach. Run-id K λ ρ Eq. MAP P10 P20 P30 BPREF GMAP Runcomb1 6 0.9 1 eq. 4 0.1039 0.1500 0.1242 0.1189 0.0915 0.0311 Runcomb2 15 0.9 0.9 eq. 5 0.1091 0.1433 0.1292 0.1267 0.0969 0.0291 Runcomb3 15 0.5 1 eq. 5 0.1354 0.2217 0.1983 0.1839 0.1402 0.0351 Runcomb4 15 0.9 1 eq. 5 0.1308 0.2100 0.1983 0.1867 0.1454 0.0264 Table 3: Results using the combination function Let us first compare runs Runcomb1 and Runcomb4, which use eq. 4 and K=6, and eq. 5 and K=15. For both, we use ρ = 1, L=5 and λ = 0.9 for the combination. Results show that using eq. 5 with K=15 is more efficient that eq. 4 with K=6, which confirms results obtained using only image queries.. In order to evaluate the combination function, we then use eq. 5, and fix ρ = 1, K=15 and L=5. We test λ = 0.5 and λ = 0.9 (runs Runcomb3 and Runcomb4). Results are almost the same but combining equally the two sources of evidence gives slightly better results. Finally, we vary ρ = 0, 9 and ρ = 1, and fix equation 5, λ = 0.9 in equation 6, K =15, L=5 (runs Runcomb4 and Runcomb2). Better results are obtained with ρ = 1, which means that the document relevance should not be taken into account in the evaluation of inner nodes relevance values (equation 2). 5.2 Medical Retrieval Task For this task, we only evaluated the combination method described in section 4. RunComb09 uses equation 5 with ρ = 1, K=15, L=10 and λ = 0.9. RunComb05 uses equation 4 with ρ=1, K=6, L=5 and λ = 0.5. Run-id Eq. L K λ MAP R-prec P10 P30 P100 RunComb09 eq. 5 10 15 0.9 0.1297 0.1687 0.2100 0.2122 0.1893 RunComb05 eq. 4 5 6 0.5 0.066 0.0.0996 0.0833 0.11 0.1023 Table 4: Results of the Medical retrieval task Results are significantly better for run RunComb09. However, as many parameters are involved (K, L, λ and the equation used to select terms) it is difficult to conclude on which parameters impact the results. Further experiments are thus needed. 6 Discussion Increasing the number of textual information resources to construct new textual queries from im- age queries improves results: the K number of selected images from FIRE results has a great impact on results. Increasing K improves thus results by introducing relevant information. Another factor of influence on results is the number of new query terms L. In our experiments, when K and L increase, the MAP metric also increases. Moreover, processing textual queries or images separately does not allow to obtain the best results: combining the two sources of evidence clearly improves results. Finally, we’d like to conclude with the type of textual information used. In the Medical and Photographic Retrieval Tasks, textual information is encoded using the XML language, and as a consequence, we decided to use an XML-oriented information retrieval system to process textual queries (XFIRM). However, elements are not organized in a hierarchic way as in can be the case in XML documents (no ancestor-descendant relationships between nodes), and the functions used by the XFIRM system to evaluate nodes relevance may not be appropriate in that case. Other experiments are consequently needed with a plain-text information retrieval system. Combining the XFIRM system with the FIRE system may be however interesting with fully encoded-XML collections. 7 Conclusion and future work We participated in the Photographic and Medical Retrieval Tasks of ImageCLEF 2007 in order to evaluate a method using a content- and context-based approach to answer to topics. We proposed a new pseudo-relevance feedback approach to process image queries and we tested an XML oriented system to process textual queries. Results showed the interest of combining the two sources of evidence (content and context) to answer to image retrieval. In future work, we plan to: • Add low level features results extracted from FIRE to the combination function in the Medical Retrieval Task, as visual features are very important in the medical domain. • Sort images using concepts level features [15] instead of low level features to construct new textual queries in the Photographic Retrieval Task. • Use a specific domain ontology to expand textual queries (original textual queries and queries obtained with our pseudo-relevance feedback approach). References [1] Susanne Boll, Wolfgang Klas, and Jochen Wandel. A cross-media adaptation strategy for multimedia presentations. In ACM Multimedia (1), pages 37–46, 1999. [2] Yih-Chen Chang, Wen-Cheng Lin, and Hsin-Hsi Chen. A corpus-based relevance feedback approach to cross-language image retrieval. In CLEF, pages 592–601, 2005. [3] T. Deselaers, D. Keysers, and H. Ney. FIRE — flexible image retrieval engine: ImageCLEF 2004 evaluation. In CLEF Workshop (2004), 2004. [4] Thomas Deselaers, Henning Mller, Paul Clogh, Hermann Ney, and Thomas M Lehmann. The clef 2005 automatic medical image annotation task. International Journal of Computer Vision, 74(1):51–58, August 2007. [5] N. Fuhr, Mounia Lalmas, S. Malik, and G. Kazai. INEX 2005 workshop proceedings, 2005. [6] Michael Grubinger, Paul Clough, Allan Hanbury, and Henning Müller. Overview of the ImageCLEF 2007 photographic retrieval task. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [7] Gareth J. F. Jones, Michael Burke, John Judge, Anna Khasin, Adenike M. Lam-Adesina, and Joachim Wagner. Dublin city university at clef 2004: Experiments in monolingual, bilingual and multilingual retrieval. In CLEF, pages 207–220, 2004. [8] Wen-Cheng Lin, Yih-Chen Chang, and Hsin-Hsi Chen. Integrating textual and visual in- formation for cross-language image retrieval. In Proceedings of the Second Asia Information Retrieval Symposium, pages 454–466, 2005. [9] Wen-Cheng Lin, Yih-Chen Chang, and Hsin-Hsi Chen. Integrating textual and visual infor- mation for cross-language image retrieval: A trans-media dictionary approach. Inf. Process. Manage., 43(2):488–502, 2007. [10] Nicolas Maillot, Jean-Pierre Chevallet, Vlad Valea, and Joo Hwee Lim. Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval. In Working Notes for the CLEF 2006 Workshop, 20-22 September , Alicante, Spain, 2006. [11] Yosi Mass and Matan Mandelbrod. Experimenting various user models for XML retrieval. In [5], 2005. [12] Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words, 1999. [13] Henning Müller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer, Thomas M. Deserno, Paul Clough, and William Hersh. Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks. In Working Notes of the 2007 CLEF Workshop, Budapest, Hungary, September 2007. [14] Karen Sauvagnat. Modle flexible pour la recherche d’information dans des corpus de docu- ments semi-structurs. PhD thesis, Toulouse : Paul Sabatier University, 2005. [15] Cees G. M. Snoek, Marcel Worring, Jan C. van Gemert, Jan-Mark Geusebroek, and Arnold W. M. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia, pages 421–430, New York, NY, USA, 2006. ACM Press. [16] Thijs Westerveld. Image retrieval: Content versus context. In Content-Based Multimedia Information Access, RIAO 2000 Conference Proceedings, pages 276–284, April 2000. [17] R. Zhao and W. Grosky. Narrowing the semantic gap - improved text-based web document retrieval using visual features, 2002.