=Paper=
{{Paper
|id=Vol-1176/CLEF2010wn-ImageCLEF-StougiannisEt2010
|storemode=property
|title=IPL at ImageCLEF 2010
|pdfUrl=https://ceur-ws.org/Vol-1176/CLEF2010wn-ImageCLEF-StougiannisEt2010.pdf
|volume=Vol-1176
}}
==IPL at ImageCLEF 2010==
IPL at ImageCLEF 2010 Alexandros Stougiannis, Anestis Gkanogiannis, and Theodore Kalamboukis Information Processing Laboratory Department of Informatics Athens University of Economics and Business 76 Patission street Athens 10434,Greece stougiannis@gail.com (utumno,tzk)@aueb.gr Abstract. This paper describes the participation of the IPL team in the ImageCLEF-2010 campaign. Our research group has participated in the ad hoc task the case-based and the modality tasks of image retrieval based only on textual data. To reach our aim we performed a quantitative evaluation of Lucene search engine using several state of the art similarity functions. Our runs per- formed moderately well using the MAP and the early precision (P10) metric. 1 Introduction This paper presents the participation of the IPL at the CLEF 2010 Image re- trieval task. The main goal of the medical ImageCLEF task is to improve the retrieval of medical images from heterogeneous and multilingual document col- lections containing images and text. Queries are formulated with sample images and some textual description explaining the search goal. Image retrieval systems do not currently perform as well as their text counter- parts [1]. Although the last decades CBIR systems have been extensively studied with significant advances they still demonstrate poor performance when applied to large databases with a wide spectrum of imaging modalities. However the retrieval performance has shown improvement by fusing the results of textual and visual techniques. This year’s image collection contains approximately 77,000 images. Some records had only one image while others had many images. This situation makes a purely text-based retrieval of images problematic however, since although only one image from a record containing many images might be relevant to a query based on the content, all the images of the record will be equally retrieved, reducing in this way the precision of the search. Most of our effort this year was concentrated in a quantitative evaluation of a search engine under several state of the art similarity functions. Our retrieval system is based on Lucene [2]. Lucene is a very popular open- source IR toolkit, which has been used in many search-related applications. Our evaluation was relied on the CLEF-2009 image collection. This test collection 2 contains a set of about 74000 images, a set of 25 topics and the relevance judge- ments. To achieve this goal we incorporated within the lucene search engine three similarity functions, as will see in more details in the next section, namely the BM25, the axiomatic similarity function and the document pivoted normaliza- tion function. Each figure in the database represented in XML format as the following simple example. simple record < record > < pmid > 12345 < /pmid > < articleurl > http : //www.someurl.com/somearticle.html < /articleurl > < caption > somecaption < /caption > < f igureid > f ig.xyz < /f igureid > < f igureids > < f igureidss > f ig.xyz.a < /f igureidss > < f igureidss > f ig.xyz.b < /f igureidss > < f igureidss > f ig.xyz.c < /f igureidss > < /f igureids > < meshterms > < meshterm > meshterm1 < /meshterm > < meshterm > meshterm2 < /meshterm > < meshterm > meshterm3 < /meshterm > < meshterm > meshterm4 < /meshterm > < /meshterms > < /record > We note that several figures referred to the same image with the same title and same captions, so we decided to unify those records into one, with one header ”figureid” and the rest figures as relevant ones. Thus we have two different fields for the figures of the case (figureid, figureids), and the first one plays the role of the header figure and the rest the role of relevant figures. Moreover the title and caption of the unified figure were merged into a single field named ”caption” that contains all the textual information that accompanies each figure. For the indexing with the Lucene, we used our own Analyzer which per- forms tokenization, removes stop words, short words (less than 2 characters), long words (more than 50 characters), transforms each word to lower case, and performs stemming using the Porter stemmer. 2 AdHoc Retrieval The ad hoc task involves retrieving relevant images using the text associated to each image query. For this task we investigated four similarity functions with the lucene search engine: the default similarity function[3], the BM25 [4, 5], the 3 axiomatic [6, 7] and the default with document pivoted normalization [8]. The aim was to examine the behavior of each of these similarity functions on small documents as the ones of the CLEF database. The evaluation was based on the CLEF-2009 database. In the following we give briefly the definition of each function. More details can be found in [11]. 1. BM25 X tf (t, D) · (k1 + 1) score(D, Q) = IDF (t) dl (1) t∈Q∧D tf (t, D) + k1 · (1 − b + b · avgdl ) The inverse document frequency is computed according to the classical BM25 model: N − n(t) + 0.5 IDF (t) = log (2) n(t) + 0.5 where N is the number of documents in the collection and n(t) is the number of documents where the term t appears. 2. Axiomatic It has been shown in an axiomatic approach to develop retrieval functions proposed in [6] that the following scoring function (2) is more robust than other state-of-the-art with comparable optimal performance. The following function (F2-EXP) according to [6, 7] is one of the best performing functions. 0.35 X N tf (t, D) score(D, Q) = tf (t, Q) · (3) t∈Q∧D df (t) tf (t, D) + 0.5 + 0.5·|D| avgdl where tf (t, Q) is the number of occurrences of term t in query Q, df (t) is the number of documents that contain term t, |D| is length of document D, and avgdl is the average document length in the collection. 3. Document Pivoting Normalization This approach uses the default scoring function of lucene but tf (t, D) which is given by the square root of the term frequency is now estimated by the relation (4) 1 + log(f req(t, D)) tf (t, D) = (4) 1 + log(avgF req(D)) where avgF req(D) = |U1D | t∈d f req(t, D), UD =the discrete terms in D, P 1 1 lengthN orm = √ P and pivot = |D| (1−slope)·pivot+slope·|UD | d∈D |UD | The results out of these four functions from the CLEF-2009 image-collection are presented in table 1. In table 2 are presented the Ad Hoc runs we submitted and a short description for each one. 4 Table 1. Performance of scoring functions on the CLEF-2009 image collection. CLEF-2009 Deafult BM25 Axiomatic Pivoted Lucene Norm numrel 2362 2362 2362 2362 numrelret 1844 1812 1847 1847 map 0.4070 0.3513 0.3835 0.4132 R-prec 0.4462 0.3827 0.4169 0.4546 bpref 0.4297 0.3614 0.3979 0.4390 P5 0.6800 0.6160 0.6480 0.6880 P10 0.6040 0.5760 0.6040 0.6520 P20 0.6060 0.5340 0.5720 0.6080 P30 0.5720 0.5107 0.5293 0.5627 Table 2. Ad Hoc runs on the 2010 image collection. Ad Hoc - submitted runs Description ipl aueb AdHoc default TC Default Lucene Indexing, one searchable field (”caption”) ipl aueb adhoq Pivoting TC Pivoting normalization Lucene indexing, one searchable field (”caption”) ipl aueb adhoq default TCg Default Lucene Indexing, one searchable field (”caption”), alternative method for duplicate figures recognition ipl aueb adhoq Pivoting TCg Pivoting normalization Lucene indexing, one searchable field (”caption”), alternative method for duplicate figures recognition ipl aueb adhoq default TCM Default Lucene Indexing, two searchable fields (”caption”, ”mesh”) ipl aueb adhoq Pivoting TCM Pivoting normalization Lucene indexing, two searchable fields (”caption”, ”mesh”) In the AdHoc version of the Medical Retrieval, the unit of retrieval is a fig- ure. A figure is identified by the figureid. The goal of the retrieval is to correctly retrieve the most relevant figures for each one of the 16 queries-figures. To com- plete this task we create indexes with the Lucene search engine using only the textual information of the figures (title + caption). Moreover as we can see in the runs with suffix TCM the documents are expanded with the MeSH terms. In this case we have the two searchable fields. In that case we weight the two fields by ratio 0.9/0.1 in favor of ”caption” over the ”mesh”. The scoring function is given by: score(Q, D) = 0.9 ∗ score(Q, DC ) + 0.1 ∗ score(Q, DM eSH ) (5) where the field DC contains the title and caption and the DM eSH the MeSH terms. 5 Table 3. Ad Hoc top 10 performing participations based on MAP. Run name MAP XRCE AX rerank comb,trec 0.3572 WIKI AX MOD late,trec 0.3380 ipl aueb AdHoc default TC 0.3235 ipl aueb adhoq default TCg 0.3225 ipl aueb adhoq default TCM 0.3209 XRCE CHI2 LOGIT IMG MOD late,trec 0.3167 ipl aueb AdHoc pivoting TC 0.3155 ipl aueb adhoq Pivoting TCg 0.3145 XRCE AF LGD IMG late,trec 0.3119 ipl aueb adhoq Pivoting TCM 0.3102 Table 4. Ad Hoc top 10 performing participations based on MAP. Run name P@10 WIKI AX MOD late,trec 0.5062 ipl aueb AdHoc default TC 0.4687 ipl aueb adhoq default TCM 0.4687 ipl aueb adhoq default TCg 0.4562 ipl aueb AdHoc pivoting TC 0.4500 ipl aueb adhoq Pivoting TCg 0.4500 ipl aueb adhoq Pivoting TCM 0.4375 XRCE AX rerank comb,trec 0.4375 XRCE AF LGD IMG late,trec 0.4375 OHSU pm major all mod 0.4375 In table 3 and 4 we present the top 10 performing participations in the Ad Hoc track based on the MAP metric and the early precision (P@10). 3 Case based Retrieval In the Case-based version of the Medical Retrieval, the unit of the retrieval is not an image but a case. Cases are identified by a PubMed id (”pmid” tag in the xml format). Unfortunately not all cases are marked with a ”pmid”, so the identifier is the ”ArticleURL” tag of the xml format. In other words, a case is pointed by a link in the web of an article that describes it. A simple fusion method has been implemented to obtain a simple list of relevant figures. The total score of each relevant document is calculated by the sum of the scores for all the retrieved figures in the document: X score(Q, D) = score(Q, F ) (6) F ∈D 6 Table 5. Case based runs on the 2010 image collection. Ad Hoc - submitted runs Description ipl aueb casebased CT Unified caption and title ”contents” searchable field ipl aueb casebased CTM 0.1 ”contents” and ”mesh” terms searchable fields, 0.1 ratio of ”mesh” ipl aueb casebased CTM 0.2 ”contents” and ”mesh” terms searchable fields, 0.2 ratio of ”mesh” ipl aueb casebased CTM 0.3 ”contents” and ”mesh” terms searchable fields, 0.3 ratio of ”mesh” ipl aueb casebased CTM 0.4 ”contents” and ”mesh” terms searchable fields, 0.4 ratio of ”mesh” ipl aueb casebased CTM 0.5 ”contents” and ”mesh” terms searchable fields, 0.5 ratio of ”mesh” Table 6. Case-based top-10 performing participations based on MAP. Run name MAP PhybaselinefbWMR 10 0.2sub 0.3165 baselinefbWMR 10 0.2sub 0.2926 PhybaselinefbWsub 0.2699 PhybaselinefbWMD 25 0.2sub 0.2699 baselinefbWsub 0.2553 IRIT SemAnnotator-1.5.2 BM25 N34 1.res 0.2521 IRIT SemAnnotator-2.0 BM25 N34 1.res 0.2521 IRIT SemAnnotator-1.5.2 BM25 N34.res 0.2444 IRIT SemAnnotator-2.0 BM25 N28 1.res 0.2444 PhybaselineRelfbWMR 10 0.2sub 0.2435 Finally, the documents are sorted by their final fusion score. Table 5 summarizes the runs we submitted and a short description for the Case-based task. Similarly in this case we use two types of indexes. For the first type, only the field ”contents” (title+caption) is searchable. At the retrieval phase we get the top 1000 results and iterate over them to add the score of the results with the same ”articleurl”. This procedure produces the run named ”ipl aueb casebased CT”. In the second type of indexes, there is an additional searchable field ”mesh”, we performed 5 runs. The number in the run’s name denotes the weight given in the ”mesh” field. A value 0.1 means a weight 0.1 for the score of the ”mesh” field and 0.9 for the score of the ”contents” field. Table 6 contains the top 10 performing participations in the case-based track based on the MAP metric. IPL was at the 28th position at the sequence with MAP equal to 0.1228. 7 Table 7. IPL’s Modality results based on accuracy. Run name Accuracy ipl aueb rhcpp full CT 0.74 ipl aueb rhcpp full CTM 0.71 ipl aueb svm full CT 0.53 ipl aueb svm full CTM 0.49 4 Modality Runs For the task of modality classification, a list of 2,390 records was given for training. The goal was to correctly classify the 2,620 records of the test set into one of the eight modalities: ”CT” , ”GX” , ”MR” , ”NM” , ”PET” , ”PX” , ”US”, ”XR”. We used two classifiers on this task: a modified Perceptron-type algorithm we have described in previous work [9] and the well known SVM classifier [10]. Again we used two types of indexes, with and without MeSH terms. Table 7 contains our results of the modality classification 5 Conclusions Our baseline textual system performed quite well, with a MAP of 32%. We will continue to improve our image retrieval system by adding more image tags using automatic visual feature extraction and heuristics. There was only a minimal improvement in performance with the use of the image modality. As future research we plan to improve the pseudo-relevance feedback strategy. 6 References 1. Hersh, W, Muller H, et al. Advancing biomedical image retrieval: develop- ment and analysis of a test collection. J. Am. Med. Inform. Assoc. 13(5), 488-96, 2006. 2. Lucene. http://lucene.apache.org/java/docs/. 3. Lucene’s default similarity function, http://lucene.apache.org/java/3 0 1/api/core/org/apache/ lucene/search/Similarity.html 4. K. Sparck Jones, S.Walker, and S. E. Robertson. A probabilistic model of in- formation retrieval: development and comparative experiments. Information Processing Management, 36(6):779 808, 809 840, 2000. 5. http://en.wikipedia.org/wiki/Okapi BM25 6. H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In Proceedings of the 2005 ACM SIGIR Conference on Research and Development in Information Retrieval, 2005. 8 7. Evaluation of the Default Similarity Function in Lucene, Hui Fang, ChengX- iang Zhai, July 15, 2007 8. Pivoted Document Length Normalization, A. Singhal, C. Buckley, M. Mitra, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, 21 - 29, 1996 9. Gkanogiannis A. and Kalamboukis T., A modified and fast Perceptron learn- ing rule and its use for Tag Recommendations in Social Bookmarking Sys- tems, 71-83, Folke Eisterlehner, etal., (Eds.), ECML PKDD Discovery Chal- lenge 2009 (DC09), International Workshop at the ECML/PKDD in Bled, Slovenia, September 7th, 2009. 10. Joachims, T., Making large-scale support vector machine learning practical in Scholkopf, B. et al, Advances in kernel methods: support vector learning, MIT Press, 1999, 169-184. 11. http://ipl.cs.aueb.gr/eng/stougiannis/