=Paper=
{{Paper
|id=Vol-1172/CLEF2006wn-ImageCLEF-DiazGalianoEt2006
|storemode=property
|title=SINAI at ImageCLEF 2006
|pdfUrl=https://ceur-ws.org/Vol-1172/CLEF2006wn-ImageCLEF-DiazGalianoEt2006.pdf
|volume=Vol-1172
|dblpUrl=https://dblp.org/rec/conf/clef/Diaz-GalianoCMRL06a
}}
==SINAI at ImageCLEF 2006==
<pdf width="1500px">https://ceur-ws.org/Vol-1172/CLEF2006wn-ImageCLEF-DiazGalianoEt2006.pdf</pdf>
<pre>
                      SINAI at ImageCLEF 2006
              M.C. Dı́az-Galiano, M.A. Garcı́a-Cumbreras, M.T. Martı́n-Valdivia,
                              A. Montejo-Raez, L.A. Ureña-López
                       University of Jaén. Departamento de Informática
                   Grupo Sistemas Inteligentes de Acceso a la Información
                    Campus Las Lagunillas, Ed. A3, E-23071, Jaén, Spain
                    {mcdiaz,magc,maite,amontejo,laurena}@ujaen.es


                                             Abstract
     This paper describes SINAI team participation in the ImageCLEF campaign. The
     SINAI research group participated in both the ad hoc task and the medical task. The
     experiments accomplished in both tasks result from very different approaches.
         For the adhoc task the main IR system used is the same as that of the 2005 Im-
     ageCLEF adhoc task. The improvement of the adhoc system is a new Machine Trans-
     lation system that works with several translators and implements several heuristics.
     We have participated in the English monolingual task and in six bilingual tasks for
     the languages: Dutch, French, German, Italian, Portuguese and Spanish. The results
     obtained shown that the English monolingual results are good (0,2234 is our best re-
     sult) and there is a loss of precision with the bilingual runs and some languages like
     German or Spanish works better than others, because of the translations.
         For the medical task, this year we carried out new and very different experiments to
     imageCLEFmed2005 ones. First of all, we have processed the set of collections using
     Information Gain (IG) to determine which are the best tags that should be considered
     in the indexing process. These tags are those supposed to provide the most relevant
     and non-redundant information, and have been selected automatically according to
     our information-based strategy along with the data and relevance assessments from
     last year.
         This year, our goal was to analyze how tag selection may contribute to the quality
     of final results. In order to select reduced set of tags we have computed IG. 11 different
     collections were generated according to the percentage of tags with highest IG value.
     Finally, only results related to experiments with selections over the 20%, 30% and 40%
     of available tags were submitted, since they reported best performance on 2005 data.
         Experiments using only textual query and using textual mixing with visual query
     have been submitted. For visual query we have used the GIFT lists provide by the
     organization. Surprisingly, the system performs better on the text retrieval alone than
     mixed textual and visual retrieval.
         On the other hand, we try show that information filtering through tag selection
     using information gain improves retrieval results without the need of a manual selection,
     but the obtained results are no conclusive. Unfortunately, the results obtained are not
     as successful as desired. Due to a computing processing mistake all our mixed runs
     obtain the same results than the visual GIFT baseline (0.0467). At the moment of
     writing of this paper we are modifying our system in order to solve this problem.

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: H.3.1 Content Analysis and Indexing; H.3.3 Infor-
mation Search and Retrieval; H.3.4 Systems and Software
Keywords
Visual and text retrieval, Information Gain, Indexing, Machine Translators


1     Introduction
This is the second participation of the SINAI research group at the ImageCLEF campaign. We
have participated in the ad hoc task and the medical task.
    As a cross-language retrieval task, multilingual image retrieval based on query translation can
achieve high performance, more than monolingual retrieval. The ad hoc task involves to retrieve
relevant images using the text associated to each image query.
    The goal of the medical task is to retrieve relevant images based on an image query [1]. For
this, organizers supply a multilingual and visual collection and a set of queries (images and a
short text in English, French and German are associated). We first preprocess the collection using
Information Gain (IG). This year, our main goal is to compare the effect of select different tags
from the collection using this measure. We have attempted to choose those tags, providing the
best information in order to improve the result obtained. We have generated several collections
with different number of tags depending on their IG. Finally, we have only submitted runs on 3
different collections (at 20%,30% and 40%) because they reported the best results for the Image-
CLEFmed2005 data. For each collection, we first compare the results obtained using only textual
query against results obtained combining textual and visual information. Finally, we have used
different methods to merge visual and textual results.
    Next section describes the ad hoc experiments. In Section 3, we explain the experiments for
the medical task. Finally, conclusions and future work are presented in Section 4.


2     The Ad Hoc Task
The goal of the ad hoc task is, given a multilingual query, to find as many relevant images as
possible, from an image collection.
    The proposal of the ad hoc task is to compare results with and without pseudo-relevant feed-
back, with or without query expansion, using different methods of query translation or using
different retrieval models and weighting functions [2].

2.1    Experiments Description
In our experiments we have used seven languages: Dutch, English, French, German, Italian,
Portuguese and Spanish.
   Because in 2005 the results were quite good, this year we have used the same IR system and the
same strategies, but introducing a new translation module. This module combines some Machine
Translators and implements some heuristics.
   The Machine Translators used have been (in brackets the translator by default for each lan-
guage):
    • Epals (German and Portuguese)
    • Prompt (Spanish)
    • Reverso (French)
    • Systran (Dutch and Italian)

    Some heuristics are, for instance, the use of the translation made by the translator by default,
a combination with the translations of every translator, or a combination of the words with a
higher punctuation (two points if it appears in the default translation and one point if it appears
in all of the other translations).
        Experiment                  Initial Query   Expansion     Weight    MAP      Rank
        sinaiEnEnFbOkapiExp1        title + narr    with          Okapi     0.2234    9/49
        sinaiEnEnFbOkapiExp2        title + narr    without       Okapi     0.0845   38/49
        sinaiEnEnFbOkapiExp3        title + narr    with          Tfidf     0.0846   37/49
        sinaiEnEnFbOkapiExp4        title + narr    without       Tfidf     0.0823   39/49

              Table 1: Summary of results for the English monolingual adhoc runs

        Experiment                  Initial Query   Expansion     Weight    MAP      Rank
        sinaiDeEnFbOkapiExp1        title + narr    with          Okapi     0.1602    4/8
        sinaiDeEnFbOkapiExp2        title + narr    without       Okapi     0.1359    7/8
        sinaiDeEnFbOkapiExp3        title + narr    with          Tfidf     0.1489    5/8
        sinaiDeEnFbOkapiExp4        title + narr    without       Tfidf     0.1369    6/8

           Table 2: Summary of results for the German-English bilingual adhoc runs


    The dataset is a new collection: IAPR. The IAPR TC-12 image collection consists of 20,000
images taken from locations around the world and comprising a varying cross-section of still natural
images. It includes pictures of a range of sports, actions, photographs of people, animals, cities,
landscapes and many other aspects of contemporary life.
    The collections have been preprocessed, using stopwords and the Porter’s stemmer.
    The collection dataset has been indexed using LEMUR IR system. It is a toolkit that supports
indexing of large-scale text databases, the construction of simple language models for documents,
queries, or subcollections, and the implementation of retrieval systems based on language models
as well as a variety of other retrieval models. The toolkit is being developed as part of the
Lemur Project, a collaboration between the Computer Science Department at the University of
Massachusetts and the School of Computer Science at Carnegie Mellon University.
    One parameter for each experiment is the weighting function, such as Okapi or TFIDF. Another
is the use or not of PRF (pseudo-relevance feedback ).

2.2    Results and Discussion
As parameters all the results are obtained using the title and narrative text, when possible. In the
English monolingual task and in the German-English bilingual task we have combined the use or
not of pseudo-relevance feedback and the weighting function (Okapi or Tfidf).
    In table 1, we can see the English monolingual results. The results obtained show that the
pseudo-relevance feedback is too important when Okapi is used as weighing function. The results
with Tfidf and with Okapi without PRF are very poor.
    Table 2 show a summary of experiments submitted and results obtained for the German-English
bilingual runs. In this case we have combine the same parameters than in the monolingual task.
    The results obtained show that there is a loss of MAP between the best monolingual experiment
and this bilingual, around a 28%. Even though, the other results in the English monolingual task
are quite worse compared to the German bilingual ones.
    Finally, table 3 show a summary of experiments submitted and results obtained for the other
five bilingual runs.
    The results obtained show that in general there is a loss of precision compared to the English
monolingual results. The Spanish result is around a 17% worse. The other languages decrease the
results.


3     The Medical Task
The main goal of medical ImageCLEF task is to improve the retrieval of medical images from
heterogeneous and multilingual document collections containing images as well as text. Queries
 Language       Experiment                 Initial Query   Expansion     Weight    MAP       Rank
 Dutch          sinaiNlEnFbOkapiExp1       title + narr    with          Okapi     0.1261     4/4
 French         sinaiFrEnFbOkapiExp1       title + narr    with          Okapi     0.1617     5/8
 Italian        sinaiItEnFbOkapiExp1       title + narr    with          Okapi     0.1216    13/15
 Portuguese     sinaiPtEnFbOkapiExp1       title + narr    with          Okapi     0.0728     7/7
 Spanish        sinaiEsEnFbOkapiExp1       title + narr    with          Okapi     0.1849     4/7

               Table 3: Summary of results for the others five bilingual adhoc runs


are formulated with sample images and a sort of textual description explaining the research goal.
For the medical task, we have used the list of retrieved images by GIFT1 which was supplied by
the organizers of this track.
    Last year, our efforts concentrated in manipulating the text descriptions associated with these
images and mixing the partial results lists with the GIFT lists [3]. However, this year our experi-
ments focus in preprocessing the collection using Information Gain (IG) in order to improve the
quality of results and to automate the tag selection process.

3.1    Preprocessing the Collection
In order to generate the textual collection we have used the ImageCLEFmed.xml file that links
collections with their images and annotations. It has external links to the images and the associated
annotations in XML files. It contains relative paths, from the root directory, to all the related
files.
     The entire collection consists of 4 datasets (CASImage, Pathopic, Peir and MIR) containing
about 50,000 images. Each subcollection is organized into cases that represent a group of related
images and annotations. At every case a group of images and an optional annotation is given.
Each image is part of a case and has optional associated annotations, which encloses metadata
and/or a textual annotation. All of the images and annotations are stored in separate files. Image-
CLEFmed.xml only contains the connections between collections, cases, images, and annotations.
     The collection annotations are in XML format. The majority of the annotations are in English
but a significant number is also in French (in the CASImage collection) and German (in the
Pathopic collection), with ew cases not contain any annotation at all. The quality of the texts
varies across collections and even within the same collection.
     For the MIR subset, specifically designed regular expressions have been applied in order to get
different segments of information, due to the lack of predefined XML tags. In this way, information
such as identificator string, authors, date and so on has been extracted from within the corpus.
     We generate a textual document per image, where the identifier number of document is the
name of the image and the text of document is the XML annotation associated to this image. If
there were several images of the same case, then the text was copied several times.
     We have used English language for the document collection as well as for the queries. Thus,
French annotations in CASImage collection were translated into English and then were incorpo-
rated to the collection. Pathopic collection has annotations in both English and German languages.
We only used English annotations in order to generate the Pathopic documents, discarding German
annotations.

3.2    Information Gain and Tag Selection
Last year, almost all tags were used to generate the final corpus. Only those labels that seemed
not to provide any information were removed, like the LANGUAGE tag. But this year these tags
have been selected according to the amount of information theoretically supplied. For this, we
have used the information gain measure as a method to select the best tags in the collection.
  1 http://www.gnu.org/software/gift/
    The main goal was to determine whether the results obtained from a corpus where tags have
been reduced by discarding those with low IG may show higher performance levels. The aim is
to eliminate those tags that do not provide further information or that introduce noise, therefore
degradating results.
    At the beginning, experiments with only 10%, 20%, 30%, ..., 100% of those labels with highest
associated IG were performed, using 2005 data for evaluation. Once results were analyzed, most
accurated results were obtained with 20%, 30% and 40% of the total of available tags, being these
ones the collections used in the submitted experiments for the 2006 campaign.
    The method applied consists in computing the information gain for every tag at every sub-
collection. Since each subcollection (CASImage, Pathopic, Peir and MIR) has a different set of
tags, the information gain was calculated using each subcolletion as scope, isolating each one from
the others. Let C be the set of cases, E the value set for the E tag, then the formula applied is
as follows:

                                      IG(C|E) = H(C) − H(C|E)                                                   (1)
   where

      IG(C|E)     is the information gain for the E tag,
        H(C)      is the entropy and
       H(C|E)     is the relative entropy
   In order to calculate this value, we compute the entropy of the set of cases C as:
                              |C|                             |C|
                              X                               X    1        1            1
                   H(C) = −         p(ci ) log2 p(ci ) = −            log2     = − log2                         (2)
                              i=1                             i=1
                                                                  |C|      |C|          |C|
   And the entropy of the set of cases C conditioned by the tag E would be:

                           |E|
                           X          µ       |Cej |
                                              X                           ¶        |E|
                                                                                   X
                               |Cej |               1             1                    |Cej |            1
                H(C|E) =                  −                log2               =−                log2            (3)
                           j=1
                               |C|            i=1
                                                  |C e j |      |C ej |            j=1
                                                                                       |C|             |Cej |

   where

      Ce j   is the subset of cases in C having the tag E set to the value ej (this
             value is a combination of words where order does not matter)
   Therefore, we can conclude the final equation for the computation of the information gain
supplied by a given tag E over the set of cases C as follows:
                                                            |E|
                                                        1   X   |Cej |        1
                             IG(C|E) = − log2             +            log2                                     (4)
                                                       |C| j=1 |C|          |Cej |

   For every tag in every collection, its information gain is computed. Then, the tags selected to
compose the final collection are those showing high values of IG. Once the document collection was
generated, experiments were conducted with the LEMUR2 retrieval information system, applying
the Kl-divergence weighting scheme.

3.3     Experiment Description
Our main goal is to investigate the effectiveness of filtering tags using IG in the text collection.
For this, we have accomplished several experiments using the ImageCLEFmed2005 in order to
determinate the best tag percentage.
  2 http://www.lemurproject.org/
                      Experiment                            Precision   Rank
                      IPAL Textual CDW (best result)         0.2646       1
                      SinaiOnlytL30                          0.1178      19
                      SinaiOnlytL40                          0.1133      20
                      SinaiOnlytL20                          0.0990      21

           Table 4: Performance of official runs in Medical Image Retrieval (text only)


    First, we have carried out experiments with 10%, 20%...100% of tags and we have evaluated
the results with the relevance assessments of the 2005 collection. Based on the result obtained,
we have only submitted runs with 20%, 30% y 40% of tags for the 2006 collection because these
corpus reported the best results. Thus for each experiment, we have submitted 3 runs (one per
corpus generated at: 20%, 30% and 40% of all available tags).
    We wanted also to compare the obtained results when we only use the text associated to
the query topic and the results when we merge visual and textual information. For this, first
experiment has been performed as baseline case. This experiment simply consists of taking the
text associated to each query as a new textual query. Then, each textual query is submitted to
the LEMUR system. The resulting list is directly the baseline run.
    The remain experiments start from the ranked lists provided by the GIFT tool. The orga-
nization provides list of relevant images generated by GIFT for each query. For each list/query
we have used an automatic textual query expansion using the associated text to the top ranked
images from GIFT lists. Thus, we have added the text associated to the first four images from
the GIFT list to the original textual query in order to generate a new textual query. Then, the
new textual query is submitted to the LEMUR system and we obtain a new ranked list. Thus,
for each original query we have 2 partial lists: one (expanded) text list and one GIFT list. The
last step consists of merging these partial resulting lists using some strategy in order to obtain
one final list (FL) with relevant images ranked by relevance. The merging process was done given
different weight of importance to the visual (VL) and textual lists (TL):

                             F L = V L ∗ α + T L ∗ β, with α + β = 1                             (5)
    In order to set these parameters we have again launched some experiments with the 2005
collection varying α and β in the range [0,1] with step 0.1 (i.e., 0, 0.1, 0.2,...,0.9 and 1). After
analyzing the results, we have submitted runs with β set to 0.5, 0.6 and 0.7 for the 2006 collection.
    These 3 experiments and the baseline experiment (that only uses textual information of the
query) have been accomplished over the 3 different corpus generated with 20%, 30% and 40%
of tags. All textual experiments have been carried out with LEMUR using Pseudo Relevance
Feedback and the Kl-divergence weighting scheme, as pointed out previously. In summary, we
have submitted 12 runs.

3.4    Results
The total runs submitted at ImageCLEFmed2006 for text only were 31 and for mixed retrieval
were 37.
    Table 4 shows the results for text only retrieval with the SINAI system. Unfortunately, due to
a computing processing mistake all our mixed runs obtain the same results than the visual GIFT
baseline (0.0467). At the moment of writing of this paper we are modifying our system in order
to solve this problem.


4     Conclusions and Further Work
In this paper, we have presented the experiments carried out in our participation in the ImageCLEF
campaign.
   For the adhoc task, we have tried a new Machine Translation module. The application of some
heuristics improves the bilingual results, but it is necessary to study the queries with poorest
results, in order to improve them. Our next work will be the improvement of the results in the IR
phase, applying new techniques for query expansion (using thesauri or web information) and the
investigation in other heuristics for the Machine Translation module.
   For the medical task, we have tried to apply Information Gain in order to improve the results.
Unfortunately, the performance obtained has been very poor. In addition, for mixed runs our
system has a computing mistake and result obtained are no conclusive. However, we consider that
the Information Gain is a good idea and a widely used method to filter information without the
need of a manual tag selection. Thus, our next step will focus on improving the visual lists and
the merging process.


5    Acknowledgements
This work has been partially supported by a grant from the Spanish Government, project R2D2
(TIC2003-07158-C04-04)


References
[1] Paul Clough, Michael Grubinger, Thomas Deselaers, Allan Hanbury, Henning Müller:
    Overview of the ImageCLEF 2006 photo retrieval and object annotation tasks. In Proceedings
    of the Cross Language Evaluation Forum (CLEF 2006), 2006.
[2] Henning Müller, Thomas Deselaers, Thomas Lehmann, Paul Clough, William Hersh: Overview
    of the ImageCLEFmed 2006 medical retrieval and annotation tasks.In Proceedings of the Cross
    Language Evaluation Forum (CLEF 2006), 2006.

[3] M.T. Martı́n-Valdivia, M.T., Garcı́a-Cumbreras, M.A., Dı́az-Galiano, M.C., Ureña-López,
    L.A., Montejo-Raez, A.: SINAI at ImageCLEF 2005. In Proceedings of the Cross Language
    Evaluation Forum (CLEF 2005), 2005.

</pre>