-

SINAI at ImageCLEF 2006

M.C. D

az-Galiano

M.A. Garc

a-Cumbreras

M.T. Mart

n-Valdivia

A. Montejo-Raez

L.A. Uren~a-L

0 0 University of Ja

1849

H 3 [Information Storage and Retrieval] H 3 1 Content Analysis and Indexing H 3 3 Information Search and Retrieval H 3 4 Systems and Software -

This paper describes SINAI team participation in the ImageCLEF campaign. The SINAI research group participated in both the ad hoc task and the medical task. The experiments accomplished in both tasks result from very di®erent approaches.

For the adhoc task the main IR system used is the same as that of the 2005 ImageCLEF adhoc task. The improvement of the adhoc system is a new Machine Translation system that works with several translators and implements several heuristics. We have participated in the English monolingual task and in six bilingual tasks for the languages: Dutch, French, German, Italian, Portuguese and Spanish. The results obtained shown that the English monolingual results are good (0,2234 is our best result) and there is a loss of precision with the bilingual runs and some languages like German or Spanish works better than others, because of the translations.

For the medical task, this year we carried out new and very di®erent experiments to imageCLEFmed2005 ones. First of all, we have processed the set of collections using Information Gain (IG) to determine which are the best tags that should be considered in the indexing process. These tags are those supposed to provide the most relevant and non-redundant information, and have been selected automatically according to our information-based strategy along with the data and relevance assessments from last year.

This year, our goal was to analyze how tag selection may contribute to the quality of ¯nal results. In order to select reduced set of tags we have computed IG. 11 di®erent collections were generated according to the percentage of tags with highest IG value. Finally, only results related to experiments with selections over the 20%, 30% and 40% of available tags were submitted, since they reported best performance on 2005 data.

Experiments using only textual query and using textual mixing with visual query have been submitted. For visual query we have used the GIFT lists provide by the organization. Surprisingly, the system performs better on the text retrieval alone than mixed textual and visual retrieval.

On the other hand, we try show that information ¯ltering through tag selection using information gain improves retrieval results without the need of a manual selection, but the obtained results are no conclusive. Unfortunately, the results obtained are not as successful as desired. Due to a computing processing mistake all our mixed runs obtain the same results than the visual GIFT baseline (0.0467). At the moment of writing of this paper we are modifying our system in order to solve this problem. 1

Introduction

This is the second participation of the SINAI research group at the ImageCLEF campaign. We have participated in the ad hoc task and the medical task.

As a cross-language retrieval task, multilingual image retrieval based on query translation can achieve high performance, more than monolingual retrieval. The ad hoc task involves to retrieve relevant images using the text associated to each image query.

The goal of the medical task is to retrieve relevant images based on an image query [ 1 ]. For this, organizers supply a multilingual and visual collection and a set of queries (images and a short text in English, French and German are associated). We ¯rst preprocess the collection using Information Gain (IG). This year, our main goal is to compare the e®ect of select di®erent tags from the collection using this measure. We have attempted to choose those tags, providing the best information in order to improve the result obtained. We have generated several collections with di®erent number of tags depending on their IG. Finally, we have only submitted runs on 3 di®erent collections (at 20%,30% and 40%) because they reported the best results for the ImageCLEFmed2005 data. For each collection, we ¯rst compare the results obtained using only textual query against results obtained combining textual and visual information. Finally, we have used di®erent methods to merge visual and textual results.

Next section describes the ad hoc experiments. In Section 3, we explain the experiments for the medical task. Finally, conclusions and future work are presented in Section 4. 2

The Ad Hoc Task

The goal of the ad hoc task is, given a multilingual query, to ¯nd as many relevant images as possible, from an image collection.

The proposal of the ad hoc task is to compare results with and without pseudo-relevant feedback, with or without query expansion, using di®erent methods of query translation or using di®erent retrieval models and weighting functions [ 2 ]. 2.1

Experiments Description

In our experiments we have used seven languages: Dutch, English, French, German, Italian, Portuguese and Spanish.

Because in 2005 the results were quite good, this year we have used the same IR system and the same strategies, but introducing a new translation module. This module combines some Machine Translators and implements some heuristics.

The Machine Translators used have been (in brackets the translator by default for each language): ² Epals (German and Portuguese) ² Prompt (Spanish) ² Reverso (French) ² Systran (Dutch and Italian)

Some heuristics are, for instance, the use of the translation made by the translator by default, a combination with the translations of every translator, or a combination of the words with a higher punctuation (two points if it appears in the default translation and one point if it appears in all of the other translations).

Weight Okapi Okapi T¯df

T¯df MAP 0.1602 0.1359 0.1489 0.1369

Rank

4/8 7/8 5/8 6/8

The dataset is a new collection: IAPR. The IAPR TC-12 image collection consists of 20,000 images taken from locations around the world and comprising a varying cross-section of still natural images. It includes pictures of a range of sports, actions, photographs of people, animals, cities, landscapes and many other aspects of contemporary life.

The collections have been preprocessed, using stopwords and the Porter's stemmer.

The collection dataset has been indexed using LEMUR IR system. It is a toolkit that supports indexing of large-scale text databases, the construction of simple language models for documents, queries, or subcollections, and the implementation of retrieval systems based on language models as well as a variety of other retrieval models. The toolkit is being developed as part of the Lemur Project, a collaboration between the Computer Science Department at the University of Massachusetts and the School of Computer Science at Carnegie Mellon University.

One parameter for each experiment is the weighting function, such as Okapi or TFIDF. Another is the use or not of PRF (pseudo-relevance feedback ). 2.2

Results and Discussion

As parameters all the results are obtained using the title and narrative text, when possible. In the English monolingual task and in the German-English bilingual task we have combined the use or not of pseudo-relevance feedback and the weighting function (Okapi or T¯df).

In table 1, we can see the English monolingual results. The results obtained show that the pseudo-relevance feedback is too important when Okapi is used as weighing function. The results with T¯df and with Okapi without PRF are very poor.

Table 2 show a summary of experiments submitted and results obtained for the German-English bilingual runs. In this case we have combine the same parameters than in the monolingual task.

The results obtained show that there is a loss of MAP between the best monolingual experiment and this bilingual, around a 28%. Even though, the other results in the English monolingual task are quite worse compared to the German bilingual ones.

Finally, table 3 show a summary of experiments submitted and results obtained for the other ¯ve bilingual runs.

The results obtained show that in general there is a loss of precision compared to the English monolingual results. The Spanish result is around a 17% worse. The other languages decrease the results. 3

The Medical Task

The main goal of medical ImageCLEF task is to improve the retrieval of medical images from heterogeneous and multilingual document collections containing images as well as text. Queries Language Experiment Dutch sinaiNlEnFbOkapiExp1 French sinaiFrEnFbOkapiExp1 Italian sinaiItEnFbOkapiExp1 Portuguese sinaiPtEnFbOkapiExp1 Spanish sinaiEsEnFbOkapiExp1

Initial Query

title + narr title + narr title + narr title + narr title + narr

Expansion

with with with with with are formulated with sample images and a sort of textual description explaining the research goal. For the medical task, we have used the list of retrieved images by GIFT1 which was supplied by the organizers of this track.

Last year, our e®orts concentrated in manipulating the text descriptions associated with these images and mixing the partial results lists with the GIFT lists [ 3 ]. However, this year our experiments focus in preprocessing the collection using Information Gain (IG) in order to improve the quality of results and to automate the tag selection process. 3.1

Preprocessing the Collection

In order to generate the textual collection we have used the ImageCLEFmed.xml ¯le that links collections with their images and annotations. It has external links to the images and the associated annotations in XML ¯les. It contains relative paths, from the root directory, to all the related ¯les.

The entire collection consists of 4 datasets (CASImage, Pathopic, Peir and MIR) containing about 50,000 images. Each subcollection is organized into cases that represent a group of related images and annotations. At every case a group of images and an optional annotation is given. Each image is part of a case and has optional associated annotations, which encloses metadata and/or a textual annotation. All of the images and annotations are stored in separate ¯les. ImageCLEFmed.xml only contains the connections between collections, cases, images, and annotations.

The collection annotations are in XML format. The majority of the annotations are in English but a signi¯cant number is also in French (in the CASImage collection) and German (in the Pathopic collection), with ew cases not contain any annotation at all. The quality of the texts varies across collections and even within the same collection.

For the MIR subset, speci¯cally designed regular expressions have been applied in order to get di®erent segments of information, due to the lack of prede¯ned XML tags. In this way, information such as identi¯cator string, authors, date and so on has been extracted from within the corpus.

We generate a textual document per image, where the identi¯er number of document is the name of the image and the text of document is the XML annotation associated to this image. If there were several images of the same case, then the text was copied several times.

We have used English language for the document collection as well as for the queries. Thus, French annotations in CASImage collection were translated into English and then were incorporated to the collection. Pathopic collection has annotations in both English and German languages. We only used English annotations in order to generate the Pathopic documents, discarding German annotations. 3.2

Information Gain and Tag Selection

Last year, almost all tags were used to generate the ¯nal corpus. Only those labels that seemed not to provide any information were removed, like the LANGUAGE tag. But this year these tags have been selected according to the amount of information theoretically supplied. For this, we have used the information gain measure as a method to select the best tags in the collection.

The main goal was to determine whether the results obtained from a corpus where tags have been reduced by discarding those with low IG may show higher performance levels. The aim is to eliminate those tags that do not provide further information or that introduce noise, therefore degradating results.

At the beginning, experiments with only 10%, 20%, 30%, ..., 100% of those labels with highest associated IG were performed, using 2005 data for evaluation. Once results were analyzed, most accurated results were obtained with 20%, 30% and 40% of the total of available tags, being these ones the collections used in the submitted experiments for the 2006 campaign.

The method applied consists in computing the information gain for every tag at every subcollection. Since each subcollection (CASImage, Pathopic, Peir and MIR) has a di®erent set of tags, the information gain was calculated using each subcolletion as scope, isolating each one from the others. Let C be the set of cases, E the value set for the E tag, then the formula applied is as follows:

IG(CjE) = H(C) ¡ H(CjE) H(CjE) =

XjEj jCej j µ j=1 jCj ¡ jCej j X i=1 jCej j 1

1 ¶ log2 jCej j = ¡ XjEj jCej j log2 jCej j

1 j=1 jCj is the subset of cases in C having the tag E set to the value ej (this value is a combination of words where order does not matter) where where

Cej (1) (2) (3) (4) IG(CjE) is the information gain for the E tag,

H(C) is the entropy and

H(CjE) is the relative entropy In order to calculate this value, we compute the entropy of the set of cases C as: H(C) = ¡ jCj jCj 1 X p(ci) log2 p(ci) = ¡ X i=1 i=1 jCj

1 1 log2 jCj = ¡ log2 jCj And the entropy of the set of cases C conditioned by the tag E would be:

Therefore, we can conclude the ¯nal equation for the computation of the information gain supplied by a given tag E over the set of cases C as follows:

IG(CjE) = ¡ log2 jC1 j + XjEj jCej j log2 jC1ej j j=1 jCj

For every tag in every collection, its information gain is computed. Then, the tags selected to compose the ¯nal collection are those showing high values of IG. Once the document collection was generated, experiments were conducted with the LEMUR2 retrieval information system, applying the Kl-divergence weighting scheme. 3.3

Experiment Description

Our main goal is to investigate the e®ectiveness of ¯ltering tags using IG in the text collection. For this, we have accomplished several experiments using the ImageCLEFmed2005 in order to determinate the best tag percentage.

Experiment IPAL Textual CDW (best result) SinaiOnlytL30 SinaiOnlytL40 SinaiOnlytL20

First, we have carried out experiments with 10%, 20%...100% of tags and we have evaluated the results with the relevance assessments of the 2005 collection. Based on the result obtained, we have only submitted runs with 20%, 30% y 40% of tags for the 2006 collection because these corpus reported the best results. Thus for each experiment, we have submitted 3 runs (one per corpus generated at: 20%, 30% and 40% of all available tags).

We wanted also to compare the obtained results when we only use the text associated to the query topic and the results when we merge visual and textual information. For this, ¯rst experiment has been performed as baseline case. This experiment simply consists of taking the text associated to each query as a new textual query. Then, each textual query is submitted to the LEMUR system. The resulting list is directly the baseline run.

The remain experiments start from the ranked lists provided by the GIFT tool. The organization provides list of relevant images generated by GIFT for each query. For each list/query we have used an automatic textual query expansion using the associated text to the top ranked images from GIFT lists. Thus, we have added the text associated to the ¯rst four images from the GIFT list to the original textual query in order to generate a new textual query. Then, the new textual query is submitted to the LEMUR system and we obtain a new ranked list. Thus, for each original query we have 2 partial lists: one (expanded) text list and one GIFT list. The last step consists of merging these partial resulting lists using some strategy in order to obtain one ¯nal list (FL) with relevant images ranked by relevance. The merging process was done given di®erent weight of importance to the visual (VL) and textual lists (TL):

F L = V L ¤ ® + T L ¤ ¯, with ® + ¯ = 1 (5)

In order to set these parameters we have again launched some experiments with the 2005 collection varying ® and ¯ in the range [ 0,1 ] with step 0.1 (i.e., 0, 0.1, 0.2,...,0.9 and 1). After analyzing the results, we have submitted runs with ¯ set to 0.5, 0.6 and 0.7 for the 2006 collection.

These 3 experiments and the baseline experiment (that only uses textual information of the query) have been accomplished over the 3 di®erent corpus generated with 20%, 30% and 40% of tags. All textual experiments have been carried out with LEMUR using Pseudo Relevance Feedback and the Kl-divergence weighting scheme, as pointed out previously. In summary, we have submitted 12 runs. 3.4

Results

The total runs submitted at ImageCLEFmed2006 for text only were 31 and for mixed retrieval were 37.

Table 4 shows the results for text only retrieval with the SINAI system. Unfortunately, due to a computing processing mistake all our mixed runs obtain the same results than the visual GIFT baseline (0.0467). At the moment of writing of this paper we are modifying our system in order to solve this problem. 4

Conclusions and Further Work

In this paper, we have presented the experiments carried out in our participation in the ImageCLEF campaign.

For the adhoc task, we have tried a new Machine Translation module. The application of some heuristics improves the bilingual results, but it is necessary to study the queries with poorest results, in order to improve them. Our next work will be the improvement of the results in the IR phase, applying new techniques for query expansion (using thesauri or web information) and the investigation in other heuristics for the Machine Translation module.

For the medical task, we have tried to apply Information Gain in order to improve the results. Unfortunately, the performance obtained has been very poor. In addition, for mixed runs our system has a computing mistake and result obtained are no conclusive. However, we consider that the Information Gain is a good idea and a widely used method to ¯lter information without the need of a manual tag selection. Thus, our next step will focus on improving the visual lists and the merging process. 5

Acknowledgements

This work has been partially supported by a grant from the Spanish Government, project R2D2 (TIC2003-07158-C04-04)

[1]

Paul

Clough , Michael Grubinger, Thomas Deselaers, Allan Hanbury, Henning MuÄller: Overview of the ImageCLEF 2006 photo retrieval and object annotation tasks . In Proceedings of the Cross Language Evaluation Forum (CLEF 2006 ), 2006 .

[2] Henning

MuÄller

, Thomas Deselaers, Thomas Lehmann, Paul Clough, William Hersh: Overview of the ImageCLEFmed 2006 medical retrieval and annotation tasks . In Proceedings of the Cross Language Evaluation Forum (CLEF 2006 ), 2006 .

[3]

M.T.

Mart

¶³n-

Valdivia , M.T. , Garc¶³a- Cumbreras , M.A. , D¶

³az-

Galiano , M.C. , Uren~ a-L¶opez, L.A. , Montejo-Raez , A. : SINAI at ImageCLEF 2005 . In Proceedings of the Cross Language Evaluation Forum (CLEF 2005 ), 2005 .