SINAI at ImageCLEF Wikipedia Retrieval task 2011:
                testing combined systems

      Miguel Ángel García-Cumbreras1, Manuel Carlos Díaz-Galiano1, L. Alfonso
                      Ureña-López1, Javier Arias-Buendía1

                               University of Jaén, SINAI Group,
                               1

                        Campus las Lagunillas, A3, 23071 Jaén, Spain
                           {magc, mcdiaz, laurena, jarias}@ujaen.es


       Abstract. Several researches demonstrate that the use and integration of various
       knowledge sources improves the quality and efficiency of information systems.
       This paper presents the system developed by the SINAI research group at the
       ImageCLEF Wikipedia Retrieval task. Using only the English text associated
       with each image or its translation from French or German, our system applies
       several combinations of the textual tags and the combination of information
       retrieval (IR) systems. The influence of machine translation, the use of the
       different annotation tags and the retrieval results from different IRs systems are
       the main aims of this work. The obtained results show that the election of the
       annotation tags and the IR system have influence in the results, and some
       fusions of the retrieved lists can improve the performance.

       Keywords: Text Retrieval, Machine Translation, Indexing, Retrieved Lists
       Combination, IRs Fusion


1 Introduction

The Wikipedia Retrieval task is an ad-hoc image retrieval task. The aim of
ImageCLEF Wikipedia Retrieval task is to investigate retrieval approaches with a
large and heterogeneous collection of images and their annotations, extracted from
Wikipedia1 [1].
   In 2011 the collection is the same used in the 2010 ImageCLEF Wikipedia task [2],
and it is composed by 237,434 Wikipedia images with unstructured and noisy
annotations in English, French and German.
   Given an English query as an image, sample images and annotations, our text-
based system uses the annotations to find relevant images from the Wikipedia image
collection. Several combinations of the results obtained with the annotation labels
were tested and the fusion of the results obtained with two IRs systems.
   The remainder of the paper is organized as follows: in the second section the
system is described. In Section 3, the experiments and results are presented. Finally,
conclusions and further work are shown in Section 4.

1 Available at http://www.wikipedia.org/
2 Description of the system

An image from the Wikipedia collection contains some data, such as the name or the
file of this image and annotations data. These annotations contain three optional tags
(description, comment and caption) in different languages (English, French and
German) also optional. Fig. 1 shows an example of an image with metadata from the
collection.


Fig. 1. Example of image from the Wikipedia collection and its metadata.


The following Fig. 2 shows the general architecture of our system. The collection
modules (on the left of the figure) work with the collection of annotations. The first
module translates non English data into English, as it has been explained previously.
The second module processes the English data, applying stopwords removal and
stemming. Caption and description tags are extracted and different combinations are
created. Then, the IR systems Lucene and Lemur create different indexes with the
combinations of tags.

The queries module (on the right of the figure) works with the queries. The first
module preprocesses the query (stopwords removal and stemming), and then the
information is run against the indexes, obtaining different lists of retrieved
documents. In some cases a fusion module is applied to join the lists from the IR
systems.
           Annotation
                                                                              Query
           document                        Collection


                                                                        Query processing
                                                                          (stopper, stemmer)
       Translation module
              (Reverso)
                                      English data (original or
                                            translated)


         Data processing
        (stopper, stemmer, tags
             combinations)
                                    Caption and description tags


                                             IR Systems
                   Tags                                                       IRs
                                           Lucene & Lemur
                 Indexes                            (English)
                                                                            Indexes


          Tags combination                                         Combination of
                                                                    relevant lists


                                                    Results


  Fig. 2. Architecture of the SINAI system.

   Our text-based system uses English as main language and according with the
language distribution of the collection, the 237,434 Wikipedia images cannot be used.
Some of the do not have annotations or the language is not determined. The following
images were used in our experiments:
        - English only: 70,127
        - English and German: 26,880
        - English and French: 20,747
          -    English, German and French: 22,899
          -    Translations from
                   o German only: 50,291
                   o French only: 28,461
                   o German and French: 9,646

   A total number of 140,653 English annotations were used and 88,398 non-English
annotations were translated.
   The first module makes the automatic translation for the non-English metadata. For
this purpose we used the online machine translator Reverso 2. Captions and
descriptions were translated, and comments were discarded because in previous
experiments we concluded that they were not useful. The second module makes a
typical text preprocessing (text from caption and description tags). English stopwords
removal and the Porter's stemmer [3] were applied.
   Then, the third module uses two Information Retrieval (IR) systems, Lemur3 and
Lucene4. Lucene is a high performance, full-featured text search engine that has
shown to be robust in several text retrieval tasks. The Lemur Toolkit is an open-
source toolkit designed to facilitate research in language modelling and information
retrieval. It supports different automatic indexing strategies and a variety of retrieval
models. In both cases the default parameters were used, the Okapi weighting schema
for Lemur and automatic feedback was applied.


2.1 Combining the lists

We have implemented two simple methods to combine the lists of relevant documents
retrieved by the IR or IRs systems.

The first method is the combination of the lists for each annotation tag. The baseline
experiments were run using only the description and caption tags. The results of these
experiments were two lists of relevant documents for each IR system used. In a
second step we used a simple technique to merge these lists [5,6]. The method is
simple; first, we have normalized the relevant values of each list; then, we applied
different weighting percentages to each partial list and finally we have merged them
into one final list re-ranking the new relevance obtained.
The different combinations are showed in the following list:
          - The use of the description tag only
          - 10% of caption and 90% of description
          - 20% of caption and 80% of description
          - …..
          - 80% of caption and 20% of description
          - 90% of caption and 10% of description
          - The use of the caption tag only


2 Available at http://www.reverso.net/
3 Available at http://www.lemurproject.org/
4 Available at http://lucene.apache.org/
   All these combinations were indexed with Lucene and Lemur separately and then
the queries were run against each index. We made previous experiments with the
ImageCLEF Wikipedia Retrieval 2010 framework (non-official results) and the best
results were obtained with the combination 80% of caption and 20% of description.
The next section shows these experiments and results.
   The second method fusion the lists of relevant documents obtained the IRs systems
used. Lucene and Lemur results were combined for each query, with different weights
for each list, as it was explained in the previous paragraph. Based on the experiments
made with the 2010 framework, the results show that Lemur works better than
Lucene. The best combined result was obtained with the fusion 90% of Lemur and a
10% of Lucene, although the improvement over the Lemur global result was not
significant. The next section shows also these experiments and results.


3 Experiment description and results

This section describes some previous experiments made in the ImageCLEF Wikipedia
framework that were the base of the system developed. The previous results are not
shown completely due to the amount of them.
   As it was described previously, the dataset used contains 140,653 English
annotations and 88,398 non-English annotations translated (229,051 in total). The
complete collection contains 237,434 Wikipedia images, so we used a 96% of the
total collection.


3.1 Previous experiments

Some previous experiments were run with the 2010 framework (the same collection
and 70 different queries). These experiments were evaluated using the relevance
judgments delivered. The main aims were to test the best combination of tags
(caption, description and comments) and the improvement of results with a simple
fusion of lists of relevant documents delivered by the IR systems Lemur and Lucene.
   The following experiments were made (with Reverso as automatic translator):
         - Lucene IR system and different combinations of caption and description
               tags:
                    o caption tag
                    o description tag
                    o caption + description tags
                    o 10% of caption + 90% of description
                    o ….
                    o 90% of caption + 10% of description
         - Lemur IR system with caption and different weighting functions (Okapi
               with feedback, kl divergence and kl divergence with feedback).
         - Lemur IR system with Okapi and feedback and the same combinations
               of caption and description tags explained before.
         -    Lemur IR system with Okapi and feedback and Lucene with different
              weights for each list of relevant documents (from 10% Lemur and
              90%Lucene to 90% Lemur and 10% Lucene).
   The following Table 1 shows some relevant results obtained with the 2010
framework:
         - Exp1: Lucene without feedback. Caption label.
         - Exp2: Lucene without feedback. Description label.
         - Exp3: Lucene without feedback. Caption and description labels.
         - Exp4: Lucene without feedback. 80% Caption and 20% description
              labels.
         - Exp5: Lemur with Okapi and feedback. Caption label.
         - Exp6: Lemur with Okapi and feedback. Description label.
         - Exp7: Lemur with Okapi and feedback. Caption and description labels.
         - Exp8: Lemur with Okapi and feedback. 80% Caption and 20%
              description labels.
         - Exp9-Best combination: 90% Lemur with Okapi and feedback and 10%
              Lucene without feedback. Caption and description labels.
   All the results are evaluated using the Mean Average Precision (MAP) function.
Last column shows the improvement or decrement in percentage of each MAP result
over the baseline case with Lucene and Lemur.

Table 1. Some results obtained with the 2010 framework.

       Experiment            MAP                            % improvement
       Exp1                  0.1488                             100%
       Exp2                  0.0728                              49%
       Exp3                  0.1422                              96%
       Exp4                  0.1628                             109%
       Exp5                  0.1665                             100%
       Exp6                  0.0698                              42%
       Exp7                  0.1991                             120%
       Exp8                  0.1991                             120%
       Exp9 BC               0.2003                             101%


  The best textual run in 2010 was submitted by XRCE with a MAP of 0.2361 [4].

   After the analysis of these results we extracted the following conclusions:
          - Lemur with feedback works better than Lucene.
          - Caption tag works well. Description tag decreases the results. The joined
              use of caption and description tags improves the baseline results and
              also a percentage combination of both tags.
          - The combination of both IR systems with caption and description tags
              improves a bit the textual result.
   We analyzed some features of the queries such as the number of words of the
query and the relevant documents obtained with each IR system, and some features of
the retrieved list of relevant documents such as the RSV (Retrieval Status Value).
After this analysis we concluded that these features cannot determine the use of one
or another IR system or the use of each tag or its combination.


3.2 Results with the 2011 framework.

Based on the previous results shown we run six experiments with the 2011 framework
to test some features of our system. These six official runs are the following:
          - Exp1: Lucene without feedback. Caption label.
          - Exp2: Lemur with Okapi and feedback. Caption label.
          - Exp3: Lucene without feedback. Caption and description labels.
          - Exp4: Lucene without feedback. 80% Caption and 20% description
              labels.
          - Exp5: Lemur with Okapi and feedback. Caption and description labels.
          - Exp6: 90% Lemur with Okapi and feedback and 10% Lucene without
              feedback. Caption and description labels.
   Table 2 shows the official runs and results for Image Wikipedia retrieval 2011.

Table 2. Official runs and results for Image Wikipedia retrieval 2011.

                    Experiment                                       MAP
                    Exp1                                            0.1496
                    Exp2                                            0.1732
                    Exp3                                            0.1200
                    Exp4                                            0.1618
                    Exp5                                            0.2052
                    Exp6                                            0.2068

   The best result was obtained with the combination of IR systems, using caption and
description tags but the improvement over the use of Lemur with caption and
description tags is not significant. Some conclusions of these official results are the
following:
         - Lemur improves Lucene in all the experiments.
         - Caption tag introduces almost the complete relevant documents, and
              description tag does not introduce some relevant documents.
         - The combination of tags improves the results.
         - The fusion of lists of relevant documents retrieved by the IR systems
              improves only a bit the MAP result.


4 Conclusions and further work

In this paper we present the experiments and results obtained with our system
developed for the ImageCLEF Wikipedia retrieval environment. We tested some
techniques with the 2010 framework, applying them to the official runs in 2011.
   In this participation we have tested some textual features such as different IR
systems, the importance of the annotation labels and some simple fusion methods to
combine these labels and the lists of relevant documents retrieved.
   Our results show that the IR systems do not obtain the same results, some labels
are more important than others and the combination of them improves the final
results.
   As further work we want to analyze in depth the results obtained, queries grouped
by topics, by length, etc. We will also test the influence of the translation with
different automatic translators. An important future aim will be to train a machine
learning system with relevant features that decide what to apply to improve the results
(combination of IR systems, external knowledge or the annotations from the different
tags).

   Acknowledgments. This work has been supported by the Regional Government of
Andalucía (Spain) under excellence project GeOasis (P08-41999), the Spanish
Government under project TEXT-COOL 2.0 (TIN2009-13391-C04-02) and the local
project RFC/UJA2009/12/14.


References

1. Theodora Tsikrika, Adrian Popescu, Jana Kludas: Overview of the wikipedia image retrieval
   task at ImageCLEF 2011. CLEF 2011 working notes, Amsterdam, The Netherlands, (2011)
2. A. Popescu, T. Tsikrika and J. Kludas Overview of the Wikipedia Retrieval Task at
   ImageCLEF 2010. In CLEF (Notebook Papers/LABs/Workshops) (2010)
3. M. F. Porter: An algorithm for suffix stripping. In Readings in information retrieval. ISBN
   1-55860-454-5; pages 313-316. Morgan Kaufmann Publishers Inc., (1997)
4. Stéphane Clinchant, Gabriela Csurka, Julien Ah-Pine, Guillaume Jacquet, Florent Perronnin,
   Jorge Sánchez, and Keyvan Minoukadeh: XRCE's Participation in Wikipedia Retrieval,
   Medical Image Modality Classification and Ad-hoc Retrieval Tasks of ImageCLEF 2010. In
   CLEF (Notebook Papers/LABs/Workshops) (2010)
5. Manuel Carlos Díaz-Galiano, Miguel Angel García-Cumbreras, María Teresa Martín-
   Valdivia, and Arturo Montejo-Ráez. Knowledge integration using textual information for
   improving ImageCLEF collections. In ImageCLEF book. ISBN 978-3-642-15181-1; pages
   295-313; DOI: 10.1007/978-3-642-15181-1_16. Springer Berlin Heidelberg, (2010)
6. M.C. Díaz-Galiano, M.T. Martín-Valdivia, A. Montejo-Ráez, and L.A. Ureña-Lopez.
   Improving Performance of Medical Images Retrieval by Combining Textual and Visual
   Information. In Artificial Intelligence - Special Session, 2007; MICAI 2007; pages 185-192;
   DOI: 10.1109/MICAI.2007.12. (2007)