SINAI at ImageCLEF 2007 M.C. Dı́az-Galiano, M.A. Garcı́a-Cumbreras, M.T. Martı́n-Valdivia, A. Montejo-Raez, L.A. Ureña-López University of Jaén. Departamento de Informática Grupo Sistemas Inteligentes de Acceso a la Información Campus Las Lagunillas, Ed. A3, E-23071, Jaén, Spain {mcdiaz,magc,maite,amontejo,laurena}@ujaen.es Abstract This paper describes the SINAI team participation in the ImageCLEF campaign. The SINAI research group has participated in both the ad hoc task and the medical task. The experiments accomplished in both cases result from very different approaches. For the ad hoc task the main Information Retrieval (IR) system used combines the document lists retrieved by two IR systems, and uses online translators for the bilingual experiments. For the medical task, we have used the MeSH ontology to expand the queries. The expansion consists in searching terms of the query in the MeSH ontology in order to add similar terms. We have processed the set of collections using Information Gain (IG) in the same way as in ImageCLEFmed 2006. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor- mation Search and Retrieval; H.3.4 Systems and Software Keywords Visual and textual retrieval, Information Gain, Indexing, Machine Translators, Ontologies, MeSH 1 Introduction This is the third participation of the SINAI research group at the ImageCLEF campaign. We have participated in the ad hoc task [6] and the medical task [7]. The ad hoc task involves retrieving relevant images using the text associated to each image query. As a cross-language retrieval task, multilingual image retrieval based on query translation can achieve a higher performance than monolingual retrieval. This year, a new IR module has been tested. This module works with two different IR systems and the final relevant list is the result of the combination of both IR lists. The Machine Translation Module developed last year has been updated and used for the bilingual task. English, Spanish, French, Italian and Portuguese are the languages used this year. The goal of the medical task is to retrieve relevant images based on an image query. This year, two new collections have been introduced. We have filtered all the collections using IG to select the best tags of each one [3]. Moreover, we have expanded the queries using MeSH ontology: we have selected similar terms to the query in the ontology and have added them to the query itself. The following section describes the ad hoc experiments. In Section 3, we explain the experi- ments for the medical task. Finally, conclusions and futher work are presented in Section 4. 2 The Ad Hoc Task Given a multilingual query, the goal of the ad hoc task is to find as many relevant images as possible from an image collection. The proposal of the ad hoc task is to compare results with and without pseudo-relevant feedback (PRF), with or without query expansion, using different methods of query translation or using different retrieval models and weighing functions. 2.1 Experiments Description In our experiments we have used the five following languages: English, French, Italian, Portuguese and Spanish. This year we have combined lists of relevant documents returned by two different IR systems: Lemur1 and Jirs [4]. As translation module we have used SINTRAM (SINai TRAnslation Module), our Meta Ma- chine Translation system that uses some online Machine Translators for each language pair and implements some heuristics to combine the different translations [5]. After a complete research we have found that the best translators were: • Systran for French, Italian and Portuguese • Prompt for Spanish The dataset is the collection IAPR. The IAPR TC-12 image collection consists of 20,000 images taken from different locations around the world and comprises a varying cross-section of still natural images. It includes pictures of a range of sports and actions, photographs of people, animals, cities, landscapes and many others of contemporary life. The collections have been preprocessed using stopwords removal and the Porter’s stemmer. The dataset of the collection has been indexed using both IR systems, namely, Lemur and Jirs. One parameter for each experiment is the weighing function, such as Okapi or TFIDF. Another is the use or not of PRF. A simple fusion method has been implemented to obtain a simple list of relevant documents. In the first step, both lists are normalized between 0 and 1. Then, some heuristics are implemented: • Weighing each list. Some experiments are based on a weighing function that gives a per- centage of importance to the Lemur list and a different one to the Jirs list. The final score of each relevant document is calculated by the sum of each score multiplied by its weight. Finally, the documents are sorted by their final fusion score. • Using a threshold. Filtering relevant documents by a threshold value is another heuristic. If the score of a document is worse than this parameter, it is not included in the final list. Finally, the resultant list is sorted by the score of the included documents. With the ad hoc 2006 framework (using the same collection, queries and relevance judgements) [1] and the heuristics already described we have evaluated several configurations in order to obtain the best ones. 1. The basic case, with Lemur, uses English queries, Lemur as IR system and Okapi with PRF as weighing function. It obtains a MAP value of 0.1672 2. The basic case, with Jirs, uses English queries, Jirs as IR system and Okapi with PRF as weighing function. It obtains a MAP value of 0.1513 3. For the first heuristic the weight for Lemur and Jirs changes between 1 and 0.1. For instance, the experiment that weighs both lists in the same way uses 0.5 as the weight value for both Lemur and Jirs. The best result was 0.1678, using a weight of 0.6 for Lemur and 0.4 for Jirs. 1 http://www.lemurproject.org Language Experiment IR Expansion Weight MAP Best MAP English EN-EN-Exp2 Lemur without Okapi 0.1591 0.2075 English EN-EN-Exp1 Jirs without Okapi 0.1473 0.2075 Spanish ES-EN-Exp9 Jirs without Okapi 0.1555 0.1558 Spanish ES-EN-Exp10 Lemur without Okapi 0.1498 0.1558 Portuguese PO-EN-Exp8 Lemur without Okapi 0.1490 0.1490 Portuguese PO-EN-Exp7 Jirs without Okapi 0.1350 0.1490 French FR-EN-Exp4 Lemur without Okapi 0.1264 0.1362 French FR-EN-Exp3 Jirs without Okapi 0.1195 0.1362 Italian IT-EN-Exp5 Jirs without Okapi 0.1231 0.1341 Italian IT-EN-Exp6 Lemur without Okapi 0.1198 0.1341 Table 1: Summary of results for the photo task: Monolingual and bilingual runs with Lemur and Jirs IR systems Language Experiment IR Expansion Weight MAP Best MAP English EN-EN-Exp11 Fusion without Okapi 0.0786 0.2075 Spanish ES-EN-Exp15 Fusion without Okapi 0.0559 0.1558 Portuguese PO-EN-Exp14 Fusion without Okapi 0.0423 0.1490 French FR-EN-Exp12 Fusion without Okapi 0.0323 0.1362 Italian IT-EN-Exp13 Fusion without Okapi 0.0492 0.1341 Table 2: Summary of results for the photo task: Monolingual and bilingual runs with lists fusion. 4. In the case of the second heuristic, different values, from 0.1 to 0.9, are tested as threshold. The best result was 0.1524, obtained with 0.1 value as threshold. Finally, all the Lemur weighing functions have been tested and the best one was again Okapi with feedback, with a result of 0.1672. 2.2 Results and Discussion With the results obtained with the 2006 framework, the new 2007 queries were run. We sent 15 runs: five runs using Lemur, five using Jirs and five with the fusion of both lists. The results obtained with each IR system (using only text) and the best MAP for each language is shown in Table 1. Good results have been obtained this year with both IR systems. Only the English runs have obtained a loss of MAP of around 25%. Our best Spanish result is similar to the best one obtained. For Portuguese we have obtained the best one, and for French and Italian our results are a bit worse: only a loss of MAP of around 8%. The MAP values are low because only the title is used as query. From these results we can conclude that Lemur IR system works better than Jirs, but the difference is not very significant. The results obtained by applying the fusion method and the best MAP for each language is shown in Table 2. The main conclusion for these fusion results is that there may be a problem, because it is not logical to obtain so poor results. Experiments with the 2006 framework gave us similar results between the ones obtained with each simple IR system and the fusion ones. 3 The Medical Task The main goal of the medical ImageCLEF task is to improve the retrieval of medical images from heterogeneous and multilingual document collections containing images and text. Queries are formulated with sample images and some textual description explaining the research goal. For the medical task we have used the list of retrieved images by FIRE2 [2], which was supplied by the organizers of this track. Last year, our efforts concentrated on manipulating the text descriptions associated with these images and mixing the results partial lists with the GIFT lists [3]. We also focused on preprocessing the collection using Information Gain (IG) in order to improve the quality of results and to automatize the tag selection process. However, this year we have concentrated on improving the queries using MeSH ontology. 3.1 Preprocessing the Collection In order to generate the textual collection we have used the ImageCLEFmed.xml file that links collections with their images and annotations. It has external links to the images and to the associated annotations in XML files. It contains relative paths from the root directory to all the related files. The entire collection consists of six datasets (CASImage, Pathopic, Peir, MIR, endoscopic and MyPACS) containing about 66,600 images (16,600 more than the previous year). Each subcollec- tion is organized into cases that represent a group of related images and annotations. In every case a group of images and an optional annotation is given. Each image is part of a case and has optional associated annotations, which enclose metadata and/or a textual annotation. All the images and annotations are stored into separated files. ImageCLEFmed.xml only contains the connections between collections, cases, images and annotations. The collection annotations are in XML format and most of them are in English. We have preprocessed the collections to generate a textual document per image [3]. We have used the IG measure to select the best XML tags in the collection. Once the document collection was generated, experiments were conducted with the Lemur retrieval information system by applying the KL-divergence weighing scheme. 3.2 Expanding Queries with MeSH Ontology The Medical Subject Headings (MeSH) is a thesaurus developed by the National Library of Medicine3 . MeSH contains two organization files, an alphabetic list with bags of synonymous and related terms, and a hierarchical organization of descriptors associated to the terms. A term is composed by one o more words. We have used the bags of terms to expand the queries. If all the words of a term are in the query, we generate a new expanded query by adding all its bag of terms. To compare the words of a particular term and those of the query, we first put all the words in lowercase and we do not remove stopwords. In order to reduce the number of terms that could expand the query, we have only used those that are in A, C or E categories of MeSH (A: Anatomy, C: Diseases, E: Analytical, Diagnostic and Therapeutic Techniques and Equipment) [8]. 3.3 Experiment Description Our main objective is to investigate the effectiveness of the query expansion together with filtering tags using IG in the text collection. We have carried out these experiments using a corpus with 20%, 30%, 40%, 50% and 60% of tags for the 2007 collection, because these settings led to the best results on the 2006 corpus. Finally, the expanded textual list and the FIRE list are merged in order to obtain one final list (FL) with relevant images ranked by relevance. The merging process was done by giving different importance to the visual (VL) and textual lists (TL): F L = T L ∗ α + V L ∗ (1 − α) (1) 2 http://www-i6.informatik.rwth-aachen.de/˜deselaers/fire.html 3 http://www.nlm.nih.gov/mesh/ Experiment Precision LIG-MRIM-LIG MU A.eval 0.3962 SINAI-SinaiC100T80.eval 0.3716 miracleTxtENN.txt.eval 0.3518 OHSU-oshu as is 1000.eval 0.3453 UB-NLM-UBTI 1.eval 0.3230 IPAL-IPAL1 TXT BAY ISA0.1.eval 0.3057 RWTH-FIRE-ME-tr0506.eval 0.3044 GE EN.treceval.eval 0.2714 iclefmed2007 text2 out.txt.eval 0.1976 UNALCO-nni FeatComb.eval 0.0082 DEU CS-DEU R2.eval 0.0028 Table 3: Performance of official runs in Medical Image Retrieval Experiment Precision SinaiC20T100 0.2950 SinaiC30T100 0.3340 SinaiC40T100 0.3507 SinaiC50T100 0.3487 SinaiC60T100 0.3452 SinaiC100T100 0.3668 Table 4: Performance of only textual information runs in Medical Image Retrieval We have submitted runs with α set to 0.5, 0.6, 0.7 and 0.8. The baseline experiments contain the 100% of the tags. To compare the mixed results we have accomplished experiments with an α = 1. 3.4 Results and Discussion The total runs submitted to ImageCLEFmed2007 for textual retrieval and mixed retrieval were more than 100. Table 3 shows the best results for the groups participating in ImageCLEFmed2007. In this table we can observe that the best result obtained by our system is the experiment with 100% of the tags and α = 0.8 (80% of textual information). In Table 4 we can see the results of only textual experiments. The best result obtained is 0.3668 of precision value, using 100% of tags in the collection. Surprisingly, the IG selection experiments do not improve the results. However, by using 40% of tags the loss of precision is lower than 5%. 4 Conclusions and Further Work This year the ad hoc task obtained good results, but always with a low MAP, because only the title was used. It could be interesting to develop a new fusion module and to apply a query expansion module based on Google, tasks on which we are already working. This year two new collections have been included in the ImageCLEFmed2007. Adding this new information in the collection improves the results obtained with the Lemur IR system. We want to investigate what type of query (textual, visual or mixed) is influenced the most by these new collections. Moreover, our next step will focus on using the UMLS ontology4 , in order to include the multiligual features of the collections. 4 http://www.nlm.nih.gov/pubs/factsheets/umls.html 5 Acknowledgements This project has been partially supported by a grant from the Spanish Government, project TIMOM (TIN2006-15265-C06-03). References [1] Clough, P., Grubinger, M., Deselaers, T., Hanbury, A., and Müller, H.: Overview of the ImageCLEF 2006 Photographic Retrieval and Object Annotation Tasks. In Proceedings of the Cross Language Evaluation Forum (CLEF 2006), 2006. [2] Deselaers, T., Weyand, T., Keysers, D., Macherey, W., and Ney, H.: FIRE in ImageCLEF 2005: Combining Content-based Image Retrieval with Textual Information Retrieval. Working Notes of the CLEF Workshop, Vienna, Austria, September 2005. [3] Dı́az-Galiano, M.C., Garcı́a-Cumbreras, M.A., Martı́n-Valdivia, M.T., Montejo-Raez,A., and Ureña-López, L.A.: SINAI at ImageCLEF 2006. In Proceedings of the Cross Language Eval- uation Forum (CLEF 2006), 2006. [4] Gómez-Soriano, J.M., Montes-y-Gómez, M., Sanchis-Arnal, E., and Rosso, P.: A Passage Retrieval System for Multilingual Question Answering. 8th International Conference of Text, Speech and Dialogue 2005 (TSD’05). Lecture Notes in Artificial Intelligence (LNCS/LNAI 3658). pp. 443-450. Karlovy Vary, Czech Republic. 2005. [5] Garcı́a-Cumbreras, M.A., Ureña-López, L.A., Martı́nez-Santiago, F., and Perea-Ortega, J.M.: BRUJA System. The University of Jaén at the Spanish task of QA@CLEF 2006. In Proceedings of the Cross Language Evaluation Forum (CLEF 2006), 2006. [6] Grubinger, M., Clough, P., Hanbury, A., and Müller, H.: Overview of the ImageCLEF 2007 Photographic Retrieval Task. Working Notes of the 2007 CLEF Workshop. Sep, 2007. Bu- dapest, Hungary. [7] Müller, H., Deselaers, T., Kim, E., Kalpathy-Cramer, J., Deserno, T.M., Clough, P., and Hersh, W.: Overview of the ImageCLEFmed 2007 Medical Retrieval and Annotation Tasks. Working Notes of the 2007 CLEF Workshop. Sep, 2007. Budapest, Hungary. [8] Chevallet, J.P., Lim, J.H., and Radhouani, S.: Using Ontology Dimensions and Negative Expansion to solve Precise Queries in CLEF Medical Task. Working Notes of the 2005 CLEF Workshop. Sep, 2005. Vienna, Austria.