=Paper= {{Paper |id=Vol-1175/CLEF2009wn-GridCLEF-FerroEt2009 |storemode=property |title=Grid@CLEF 2009 Track Overview |pdfUrl=https://ceur-ws.org/Vol-1175/CLEF2009wn-GridCLEF-FerroEt2009.pdf |volume=Vol-1175 |dblpUrl=https://dblp.org/rec/conf/clef/FerroH09a }} ==Grid@CLEF 2009 Track Overview== https://ceur-ws.org/Vol-1175/CLEF2009wn-GridCLEF-FerroEt2009.pdf
 CLEF 2009: Grid@CLEF Pilot Track Overview

                        Nicola Ferro1 and Donna Harman2
       1
           Department of Information Engineering, University of Padua, Italy
                                 ferro@dei.unipd.it
           2
             National Institute of Standards and Technology (NIST), USA
                               donna.harman@nist.gov



      Abstract. The Grid@CLEF track is a long term activity with the aim
      of running a series of systematic experiments in order to improve the
      comprehension of MLIA systems and gain an exhaustive picture of their
      behaviour with respect to languages.
      In particular, Grid@CLEF 2009 is a pilot track that has started to move
      the first steps in this direction by giving the participants the possibility
      of getting experienced with the new way of carrying out experimentation
      that is needed in Grid@CLEF to test all the different combinations of IR
      components and languages. Grid@CLEF 2009 offered traditional mono-
      lingual ad-hoc tasks in 5 different languages (Dutch, English, French,
      German, and Italian) which make use of consolidated and very well
      known collections from CLEF 2001 and 2002 and used a set of 84 topics.
      Participants had to conduct experiments according to the CIRCO frame-
      work, an XML-based protocol which allows for a distributed, loosely-
      coupled, and asynchronous experimental evaluation of IR systems. We
      provided a Java library which can be exploited to implement CIRCO
      and an example implementation with the Lucene IR system.
      The participation has been especially challenging also for the size of the
      XML files generated by CIRCO, which can become 50-60 times the size
      of the collection. Of the 9 initially subscribed participants, only 2 were
      able to submit runs in time and we received a total of 18 runs in 3
      languages (English, French, and German) out of the 5 offered. The two
      participants used different IR systems or combination of them, namely
      Lucene, Terrier, and Cheshire II.


1   Introduction
Much of the effort of Cross-Language Evaluation Forum (CLEF) over the years
has been devoted to the investigation of key questions such as “What is Cross
Language Information Retrieval (CLIR)?”, “What areas should it cover?” and
“What resources, tools and technologies are needed?” In this respect, the Ad
Hoc track has always been considered as the core track in CLEF and it has been
the starting point for many groups as they begin to be interested in developing
functionality for the multilingual information access. Thanks to this pioneering
work, CLEF produced, over the years, the necessary groundwork and foundations
to be able, today, to start wondering how to go deeper and to address even more
challenging issues [12,13].
    The Grid@CLEF Pilot track1 moves the first steps in this direction and aims
at [11]:

 – looking at differences across a wide set of languages;
 – identifying best practices for each language;
 – helping other countries to develop their expertise in the Information Retrieval
   (IR) field and create IR groups;
 – providing a repository, in which all the information and knowledge derived
   from the experiments undertaken can be managed and made available via
   the Distributed Information Retrieval Evaluation Campaign Tool (DIRECT)
   system.

    The Grid@CLEF pilot track in CLEF 2009 has provided us with an oppor-
tunity to begin to set up a suitable framework in order to carry out a first set
of experiments which allows us to acquire an initial set of measurements and
to start to explore the interaction among IR components and languages. This
initial knowledge will allow us to tune the overall protocol and framework, to
understand what directions are more promising, and to scale the experiments
up to a finer-grain comprehension of the behaviour of IR components across
languages.
    The paper is organized as follows: Section 2 provides an overview of the ap-
proach and the issues that need to be faced in Grid@CLEF; Section 3 introduces
CIRCO, the framework we are developing in order to enable the Grid@CLEF ex-
periments; Section 4 describes the experimental setup that has been adopted for
Grid@CLEF 2009; Section 5 presents the main outcomes of this year Grid@CLEF
in terms of participation and performances achieved; finally, Section 6 discusses
the different approached and findings of the participants in Grid@CLEF.


2     Grid@CLEF Approach

Individual researchers or small groups do not usually have the possibility of
running large-scale and systematic experiments over a large set of experimental
collections and resources. Figure 1 depicts the performances, e.g. mean average
precision, of the composition of different IR components across a set of languages
as a kind of surface area which we intend to explore with our experiment. The
average CLEF participants, shown in Figure 1(a), may only be able to sample
a few points on this surface since, for example, they usually test just a few
variations of their own or customary IR model with a stemmer for two or three
languages. Instead, the expert CLEF participant, represented in Figure 1(b),
may have the expertise and competence to test all the possible variations of
a given component across a set of languages, as [22] does for stemmers, thus
investigating a good slice of the surface area.
    However, even though each of these cases produces valuable research results
and contributes to the advancement of the discipline, they are both still far
1
    http://ims.dei.unipd.it/gridclef/
                          1

                         0.9
Mean Average Precision



                         0.8

                         0.7

                         0.6

                         0.5

                         0.4

                         0.3

                                                                                                                                                                                                                       Merging Strategies
                         0.2
                                                                                                                                                                                                                   Pivot Language
                                                                                                                                                                                                                Word Sense Disambiguation
                         0.1
                                                                                                                                                                                                          Aligned Corpora
                                                                                                                                                                                                      Parallel Corpora
                          0                                                                                                                                                                       Machine Translation
                                                                                                                                                                                            Machine Readable Dictionaty
                           Swedish                                                                                                                                                      Post−translation Relevance Feedback
                                     Spanish                                                                                                                                        Pre−translation Relevance Feedback
                                               Russian
                                                    Portuguese                                                                                                                 Divergence From Randomness
                                                                 Italian                                                                                                  Language Models
                                                                       Hungarian                                                                                      Probabilistic Model
                                                                                   German                                                                         Vector Space Model
                                                                                            French
                                                                                                     Finnish                                                  Boolean Model
                                                                                                               English                                     Word de−compounder
                                                                                                                         Dutch                       Stemmer
                                                                                                                                 Czech           Stop List
                                                                                                                                     Bulgarian
                                                                                                                                                                                 Components
                                                                                    Language

                                                                                        (a) Average CLEF participants.




                          1

                         0.9
Mean Average Precision




                         0.8

                         0.7

                         0.6

                         0.5

                         0.4

                         0.3

                                                                                                                                                                                                                       Merging Strategies
                         0.2
                                                                                                                                                                                                                   Pivot Language
                                                                                                                                                                                                                Word Sense Disambiguation
                         0.1
                                                                                                                                                                                                          Aligned Corpora
                                                                                                                                                                                                      Parallel Corpora
                          0                                                                                                                                                                       Machine Translation
                                                                                                                                                                                            Machine Readable Dictionaty
                           Swedish                                                                                                                                                      Post−translation Relevance Feedback
                                     Spanish                                                                                                                                        Pre−translation Relevance Feedback
                                               Russian
                                                    Portuguese                                                                                                                 Divergence From Randomness
                                                                 Italian                                                                                                  Language Models
                                                                       Hungarian                                                                                      Probabilistic Model
                                                                                   German                                                                         Vector Space Model
                                                                                            French
                                                                                                     Finnish                                                  Boolean Model
                                                                                                               English                                     Word de−compounder
                                                                                                                         Dutch                       Stemmer
                                                                                                                                 Czech           Stop List
                                                                                                                                     Bulgarian
                                                                                                                                                                                 Components
                                                                                    Language

                                                                                            (b) Expert CLEF participant.

                                           Fig. 1. Coverage achieved by different kinds of participants.
                              1

                             0.9




    Mean Average Precision
                             0.8

                             0.7

                             0.6

                             0.5

                             0.4

                             0.3

                                                                                                                                                                                                                           Merging Strategies
                             0.2
                                                                                                                                                                                                                       Pivot Language
                                                                                                                                                                                                                    Word Sense Disambiguation
                             0.1
                                                                                                                                                                                                              Aligned Corpora
                                                                                                                                                                                                          Parallel Corpora
                              0                                                                                                                                                                       Machine Translation
                                                                                                                                                                                                Machine Readable Dictionaty
                               Swedish                                                                                                                                                      Post−translation Relevance Feedback
                                         Spanish                                                                                                                                        Pre−translation Relevance Feedback
                                                   Russian
                                                        Portuguese                                                                                                                 Divergence From Randomness
                                                                     Italian                                                                                                  Language Models
                                                                           Hungarian                                                                                      Probabilistic Model
                                                                                       German                                                                         Vector Space Model
                                                                                                French
                                                                                                         Finnish                                                  Boolean Model
                                                                                                                   English                                     Word de−compounder
                                                                                                                             Dutch                       Stemmer
                                                                                                                                     Czech           Stop List
                                                                                                                                         Bulgarian
                                                                                                                                                                                     Components
                                                                                        Language


                                               Fig. 2. The three main entities involved in grid experiments.


removed from a clear and complete comprehension of the features and properties
of the surface. A far deeper sampling would be needed for this, as shown in
Figure 2: in this sense, Grid@CLEF will create a fine-grained grid of points over
this surface and, hence, the name of the track comes.
    It is our hypothesis that a series of systematic experiments can re-use and
exploit the valuable resources and experimental collections made available by
CLEF in order to gain more insights about the effectiveness of, for example, the
various weighting schemes and retrieval techniques with respect to the languages.
    In order to do this, we must deal with the interaction of three main entities:

 – Component: in charge of carrying out one of the steps of the IR process;
 – Language: will affect the performance and behaviour of the different com-
   ponents of an Information Retrieval System (IRS) depending on its specific
   features, e.g. alphabet, morphology, syntax, and so on.
 – Task: will impact on the performances of IRS components according to its
   distinctive characteristics;

    We assume that the contributions of these three main entities to retrieval
performance tend to overlap; nevertheless, at present, we do not have enough
knowledge about this process to say whether, how, and to what extent these
entities interact and/or overlap – and how their contributions can be combined,
e.g. in a linear fashion or according to some more complex relation.
    The above issue is in direct relationship with another long-standing problem
in the IR experimentation: the impossibility of testing a single component inde-
pendently of a complete IRS. [16, p. 12] points out that “if we want to decide
between alternative indexing strategies for example, we must use these strate-
gies as part of a complete information retrieval system, and examine its overall
performance (with each of the alternatives) directly”. This means that we have
to proceed by changing only one component at time and keeping all the others
fixed, in order to identify the impact of that component on retrieval effectiveness;
this also calls for the identification of suitable baselines with respect to which
comparisons can be made.


3   The CIRCO Framework
In order to run these grid experiments, we need to set up a framework in which
participants can exchange the intermediate output of the components of their
systems and create a run by using the output of the components of other par-
ticipants.
    For example, if the expertise of participant A is in building stemmers and
decompounders while participant B’s expertise is in developing probabilistic IR
models, we would like to make it possible for participant A to apply his stem-
mer to a document collection, pass the output to participant B, who tests his
probabilistic IR model, thus obtaining a final run which represents the test of
participant A’ stemmer + participant B probabilistic IR model.
    To this end, the objective of the Coordinated Information Retrieval Compo-
nents Orchestration (CIRCO) framework [10] is to allow for a distributed, loosely-
coupled, and asynchronous experimental evaluation of Information Retrieval (IR)
systems where:
 – distributed highlights that different stakeholders can take part to the ex-
   perimentation each one providing one or more components of the whole IR
   system to be evaluated;
 – loosely-coupled points out that minimal integration among the different com-
   ponents is required to carry out the experimentation;
 – asynchronous underlines that no synchronization among the different com-
   ponents is required to carry out the experimentation.
     The CIRCO framework allows different research groups and industrial par-
ties, each one with their own areas of expertise, to take part in the creation of
collaborative experiments. This is a radical departure from today’s IR evalua-
tion practice where each stakeholder has to develop (or integrate components to
build) an entire IR system to be able to run a single experiment.
     The base idea – and assumption – behind CIRCO to streamline the archi-
tecture of an IR system and represent it as a pipeline of components chained
together. The processing proceeds by passing the results of the computations of
a component as input to the next component in the pipeline without branches,
i.e. no alternative paths are allowed in the chain.
     To get an intuitive idea of the overall approach adopted in CIRCO, consider
the example pipeline shown in Figure 3(a).
     The example IR system is constituted by the following components:
 – tokenizer : breaks the input documents into a sequence of tokens;
 – stop word remover : removes stop words from the sequence of tokens;
 – stemmer : stems the tokens;
 – indexer : weights the tokens and stores them and the related information in
   an index.
                                Stop Word
               Tokenizer                        Stemmer        Indexer
                                 Remover




                       (a) An example pipeline for an IR system.

                                Stop Word
               Tokenizer                        Stemmer        Indexex
                                 Remover




               (b) An example of CIRCO pipeline for an IR system.

Fig. 3. Example of CIRCO approach to distributed, loosely-coupled, and asynchronous
experimentation.


    Instead of directly feeding the next component as usually happens in an
IR system, CIRCO operates by requiring each component to input and output
from/to eXtensible Markup Language (XML) [25] files in a well-defined format,
as shown in Figure 3(b).
    These XML files can then be exchanged among the different stakeholders
that are involved in the evaluation. In this way, we can meet the requirements
stated above by allowing for an experimentation that is:
 – distributed since different stakeholders can take part in the same experiment,
   each one providing his own component(s);
 – loosely-coupled since the different components do not need to be integrated
   into a whole and running IR system but only need to communicate by means
   of a well-defined XML format;
 – asynchronous since the different components do not need to operate all at
   the same time or immediately after the previous one but can exchange and
   process the XML files at different rates.
   In order to allow this way of conducting experiments, the CIRCO framework
consists of:
 – CIRCO Schema: an XML Schema [23,24] model which precisely defines the
   format of the XML files exchanged among stakeholders’ components;
 – CIRCO Web: an online system which manages the registration of stakehold-
   ers’ components, their description, and the exchange of XML message;
 – CIRCO Java 2 : an implementation of CIRCO based on the Java3 program-
   ming language to facilitate its adoption and portability.
2
  The documentation is available at the following address: http://ims.dei.unipd.
  it/software/circo/apidoc/.
  The source code and the binary code are available at the following address: http:
  //ims.dei.unipd.it/software/circo/jar/.
3
  http://java.sun.com/
   The choice of using an XML-based exchange format is due to the fact that the
main other possibility, i.e. to develop a common Application Program Interface
(API) IR systems have to comply with, presents some issues:

 – the experimentation would not be loosely-coupled, since all the IR systems
   would have to be coded with respect to the same API;
 – much more complicated solutions would be required for allowing the dis-
   tributed and asynchronous running of the experiments, since you would need
   some kind of middleware for process orchestration and message delivery;
 – multiple versions of the API in different languages should be provided to
   take into account the different technologies used to develop IR system;
 – the integration with legacy code could be problematic and require a lot of
   effort;
 – overall, stakeholders would be distracted from their main objective, which is
   running an experiment and evaluating a system.


4     Track Setup

The Grid@CLEF tracks offers a traditional ad-hoc task – see, for example, [1] –
which makes use of experimental collections developed according to the Cranfield
paradigm [6]. This first year task focuses on monolingual retrieval, i.e. querying
topics against documents in the same language of the topics, in five European
languages:

 – Dutch;
 – English;
 – French;
 – German;
 – Italian.

   The selected languages allow participants to test both romance and germanic
languages, as well as languages with word compounding issues. These languages
have been extensively studied in the MultiLingual Information Access (MLIA)
field and, therefore, it will be possible to compare and assess the outcomes of
the first year experiments with respect to the existing literature.
   This first year track has a twofold goal:

1. to prepare participants’ systems to work according to CIRCO framework;
2. to conduct as many experiments as possible, i.e. to put as many dots as
   possible on the grid.


4.1   Test Collections

Grid@CLEF 2009 used the test collection originally developed for the CLEF
2001 and 2002 campaigns [3,4].
        
        
           10.2452/125-AH
        
           Gemeenschapplijke Europese munt.
           European single currency
           La monnaie unique européenne
           Europäische Einheitswährung
           La moneta unica europea
           
           
              Wat is het geplande tijdschema voor de invoering van de gemeenschapplijke Europese
        munt?
           
           
              What is the schedule predicted for the European single currency?
           
           
              Quelles sont les prévisions pour la mise en place de la monnaie unique européenne?
           
           
              Wie sieht der Zeitplan für die Einführung einer europäischen Einheitswährung aus?
           
           
              Qual è il calendario previsto per la moneta unica europea?
           
           
           
              De veronderstellingen van politieke en economische persoonlijkheden wat betreft het
        tijdschema waarbinnen men zal komen tot de invoering van een gemeenschapplijke munt voor de
        Europese Unie zijn van belang.
           
           
              Speculations by politicians and business figures about a calendar for achieving a
        common currency in the EU are relevant.
           
           
              Les débats animés par des personnalités du monde politique et économique sur le
        calendrier prévisionnel pour la mise en œuvre de la monnaie unique dans l'Union
        Européenne sont pertinents.
           
           
              Spekulationen von Vertretern aus Politik und Wirtschaft über einen Zeitplan zur
        Einführung einer gemeinsamen europäischen Währung sind relevant.
           
           
              Sono rilevanti le previsioni, da parte di personaggi politici e dell'economia, sul
        calendario delle scadenze per arrivare a una moneta unica europea.
           
        



      Fig. 4. Example of topic http://direct.dei.unipd.it/10.2452/125-AH.



The Documents. Table 1 reports the document collections which have been
used for each of the languages offered for the track.


Topics Topics are structured statements representing information needs. Each
topic typically consists of three parts: a brief “title” statement; a one-sentence
“description”; a more complex “narrative” specifying the relevance assessment
criteria. Topics are prepared in xml format and uniquely identified by means of
a Digital Object Identifier (DOI)4 .
    In Grid@CLEF 2009, we used 84 out of 100 topics in the set 10.2452/41-AH–
10.2452/140-AH originally developed for CLEF 2001 and 2002 since they have
relevant documents in all the collections of Table 1, as detailed in Table 2.
    Figure 4 provides an example of the used topics for all the five languages.

4
    http://www.doi.org/
                  Table 1. Document collections.

Language            Collection        Documents Size (approx.)
           NRC Handelsblad 1994/95         84,121    291 Mbyte
Dutch      Algemeen Dagblad 1994/95       106,484    235 Mbyte
                                         190,605   526 Mbyte
English    Los Angeles Times 1994        113,005   420 Mbyte
           Le Monde 1994                   44,013    154 Mbyte
French     French SDA 1994                 43,178     82 Mbyte
                                          87,191   236 Mbyte
           Frankfurter Rundschau 1994     139,715    319 Mbyte
           Der Spiegel 1994/95             13,979     61 Mbyte
German
           German SDA 1994                 71,677    140 Mbyte
                                         225,371   520 Mbyte
           La Stampa 1994                  58,051    189 Mbyte
Italian    Italian SDA 1994                50,527     81 Mbyte
                                         108,578   270 Mbyte




                         Table 2. Topics

    Language             Relevant Documents
    Dutch
    English No relevant documents for topic 10.2452/54-AH
             No relevant documents for topic 10.2452/57-AH
             No relevant documents for topic 10.2452/60-AH
             No relevant documents for topic 10.2452/93-AH
             No relevant documents for topic 10.2452/96-AH
             No relevant documents for topic 10.2452/101-AH
             No relevant documents for topic 10.2452/110-AH
             No relevant documents for topic 10.2452/117-AH
             No relevant documents for topic 10.2452/118-AH
             No relevant documents for topic 10.2452/127-AH
             No relevant documents for topic 10.2452/132-AH
    French   No relevant documents for topic 10.2452/64-AH
    German No relevant documents for topic 10.2452/44-AH
    Italian  No relevant documents for topic 10.2452/43-AH
             No relevant documents for topic 10.2452/52-AH
             No relevant documents for topic 10.2452/64-AH
             No relevant documents for topic 10.2452/120-AH
                     Table 3. Grid@CLEF 2009 participants.

           Participant            Institution             Country
           chemnitz    Chemnitz University of Technology Germany
           cheshire    U.C.Berkeley                      United States


Relevance Assessment The same relevance assessment developed for CLEF
2001 and 2002 have been used; for further information see [3,4].

4.2    Result Calculation
Evaluation campaigns such as TREC and CLEF are based on the belief that the
effectiveness of IRSs can be objectively evaluated by an analysis of a representa-
tive set of sample search results. For this, effectiveness measures are calculated
based on the results submitted by the participants and the relevance assess-
ments. Popular measures usually adopted for exercises of this type are Recall
and Precision. Details on how they are calculated for CLEF are given in [5]. We
used trec eval5 8.0 to compute the performance measures.
    The individual results for all official Grid@CLEF experiments in CLEF 2009
are given in the Appendices of the CLEF 2009 Working Notes [8]. You can also
access them online at:
 – monolingual English: http://direct.dei.unipd.it/DOIResolver.do?type=
   task&id=GRIDCLEF-MONO-EN-CLEF2009
 – monolingual French: http://direct.dei.unipd.it/DOIResolver.do?type=
   task&id=GRIDCLEF-MONO-FR-CLEF2009
 – monolingual German: http://direct.dei.unipd.it/DOIResolver.do?type=
   task&id=GRIDCLEF-MONO-DE-CLEF2009

5     Track Outcomes
5.1    Participants and Experiments
As shown in Table 3, a total of 2 groups from 2 different countries submitted
official results for one or more of the Grid@CLEF 2009 tasks.
    Participants were required to submit at least one title+description (“TD”)
run per task in order to increase comparability between experiments: all the
18 submitted runs used this combination of topic fields. A breakdown into the
separate tasks is shown in Table 4.
    The participation in this first year was especially challenging because of the
need of modifying existing systems to implement the CIRCO framework. More-
over, it has been challenging also from the computational point of view since,
for each component in a IR pipeline, CIRCO could produce XML files that are
50-60 times the size of the original collection; this greatly increased the indexing
time and the time needed to submit runs and deliver the corresponding XML
files.
5
    http://trec.nist.gov/trec_eval
              Table 4. Breakdown of experiments into tasks and topic languages.

                                Task          # Participants # Runs
                         Monolingual Dutch                 0      0
                         Monolingual English               2      6
                         Monolingual French                2      6
                         Monolingual German                2      6
                         Monolingual Italian               0      0
                                         Total                   18



   5.2     Results

   Table 5 shows the top runs for each target collection, ordered by mean average
   precision. The table reports: the short name of the participating group; the mean
   average precision achieved by the experiment; the DOI of the experiment; and
   the performance difference between the first and the last participant.
      Figure 5 compares the performances of the top participants of the Grid@CLEF
   monolingual tasks.


   6      Approaches and Discussion

   Chemnitz [9] approached the participation in Grid@CLEF into the wider con-
   text of the creation of an archive of audiovisual media which can be jointly
   used by German TV stations, stores both raw material as well as produced and
   broadcasted material and needs to be described as comprehensively as possible
   in order to be easily searchable. In this context, they have developed the Xtrieval
   system, which aims to be flexible and easily configurable in order to be adjusted
   to different corpora, multimedia search tasks, and annotation kinds. Chemnitz
   tested both the vector space model [20,19], as implemented by Lucene6 and
    6
        http://lucene.apache.org/


                          Table 5. Best entries for the Grid@CLEF tasks.

 Track Rank Participant                          Experiment DOI                                           MAP
         1st chemnitz   10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER 54.45%

English 2nd chesire     10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB                 53.13%
        Difference                                                                                          2.48%
         1st chesire    10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB                 51.88%
French 2nd chemnitz     10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 49.42%

        Difference                                                                                          4.97%
         1st chemnitz   10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER 48.64%

German 2nd chesire      10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB                 40.02%
        Difference                                                                                         21.53%
            Grid@CLEF 2009 Monolingual English Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision
             100%
            chemnitz [Experiment CUT_GRID_MONO_EN_MERGED_LUCENE_TERRIER; MAP 54.46%; Not Pooled]
            cheshire [Experiment CHESHIRE_GRID_ENG_T2FB; MAP 53.13%; Not Pooled]
              90%


               80%


               70%


               60%


Precision      50%


               40%


               30%


               20%


               10%


                0%
                  0%       10%       20%      30%       40%      50%      60%       70%      80%       90%     100%
                                                                Recall

                                       (a) Monolingual English
              Grid@CLEF 2009 Monolingual French Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision
               100%
              cheshire [Experiment CHESHIRE_GRID_FRE_T2FB; MAP 51.88%; Not Pooled]
              chemnitz [Experiment CUT_GRID_MONO_FR_MERGED_LUCENE_TERRIER; MAP 49.42%; Not Pooled]
                90%


               80%


               70%


               60%
  Precision




               50%


               40%


               30%


               20%


               10%


                 0%
                   0%       10%      20%       30%      40%       50%      60%       70%      80%       90%     100%
                                                                 Recall


                                        (b) Monolingual French
            Grid@CLEF 2009 Monolingual German Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision
             100%
            chemnitz [Experiment CUT_GRID_MONO_DE_MERGED_LUCENE_TERRIER; MAP 48.64%; Not Pooled]
            cheshire [Experiment CHESHIRE_GRID_GER_T2FB; MAP 40.03%; Not Pooled]
              90%


               80%


               70%


               60%
Precision




               50%


               40%


               30%


               20%


               10%


                0%
                  0%       10%       20%      30%       40%      50%       60%      70%       80%      90%      100%
                                                                Recall


                                       (c) Monolingual German

 Fig. 5. Recall-precision graph for Grid@CLEF tasks.
BM25 [17,18], as implemented by Terrier7 , in combination with Snowball8 and
Savoy’s [21] stemmers. They found out that the impact of retrieval techniques are
highly depending on the corpus and quite unpredictable and that, even if over
they years they have learned how to guess reasonable configurations for their
system in order to get good results, there is still the need of “strong rules which
let us predict the retrieval quality . . . [and] enable us to automatically configure
a retrieval engine in accordance to the corpus”. This was for them motivation
to participate in Grid@CLEF 2009, which represented a first attempt that will
allow them to go also in this direction.
    Chesire [14] participated in Grid@CLEF with their Chesire II system based
on logistic regression [7] and their interest was in understanding what happens
when you try to separate the processing elements of IR systems and look at their
intermediate output, taking this as an opportunity to re-analyse and improve
their system, and, possibly, finding a way to incorporate into Chesire II com-
ponents of other IR systems for subtasks in which they currently cannot do or
cannot do effectively, such as decompounding German words. They also found
that “the same algorithms and processing systems can have radically different
performance on different collections and query sets”. Finally, the participation in
Grid@CLEF actually allowed Cheshire to improve their system and to point out
some suggestions for the next Grid@CLEF, concerning the support for the cre-
ation of multiple indexes according to the structure of a document and specific
indexing tasks related to the geographic information retrieval, such as geographic
names extraction and geo-referencing.


Acknowledgements
The authors would like to warmly thank the members of the Grid@CLEF Advi-
sory Committee – Martin Braschler, Chris Buckley, Fredric Gey, Kalervo Järvelin,
Noriko Kando, Craig Macdonald, Prasenjit Majumder, Paul McNamee, Teruko
Mitamura, Mandar Mitra, Stephen Robertson, and Jacques Savoy – for the use-
ful discussions and suggestions.
    The work reported has been partially supported by the TrebleCLEF Coor-
dination Action, within FP7 of the European Commission, Theme ICT-1-4-1
Digital Libraries and Technology Enhanced Learning (Contract 215231).


References
 1. E. Agirre, G. M. Di Nunzio, N. Ferro, T. Mandl, and C. Peters. CLEF 2008: Ad
    Hoc Track Overview. In F. Borri, A. Nardi, and C. Peters, editors, Working Notes
    for the CLEF 2008 Workshop. http://www.clef-campaign.org/2008/working_
    notes/adhoc-final.pdf [last visited 2008, September 10], 2008.
 2. F. Borri, A. Nardi, and C. Peters, editors. Working Notes for the CLEF 2009
    Workshop. Published Online, 2009.
7
    http://ir.dcs.gla.ac.uk/terrier/index.html
8
    http://snowball.tartarus.org/
 3. M. Braschler. CLEF 2001 – Overview of Results. In Peters et al. [15], pages 9–26.
 4. M. Braschler. CLEF 2002 – Overview of Results. In C. Peters, M. Braschler,
    J. Gonzalo, and M. Kluck, editors, Advances in Cross-Language Information Re-
    trieval: Third Workshop of the Cross–Language Evaluation Forum (CLEF 2002)
    Revised Papers, pages 9–27. Lecture Notes in Computer Science (LNCS) 2785,
    Springer, Heidelberg, Germany, 2003.
 5. M. Braschler and C. Peters. CLEF 2003 Methodology and Metrics. In C. Peters,
    M. Braschler, J. Gonzalo, and M. Kluck, editors, Comparative Evaluation of Multi-
    lingual Information Access Systems: Fourth Workshop of the Cross–Language Eval-
    uation Forum (CLEF 2003) Revised Selected Papers, pages 7–20. Lecture Notes in
    Computer Science (LNCS) 3237, Springer, Heidelberg, Germany, 2004.
 6. C. W. Cleverdon. The Cranfield Tests on Index Languages Devices. In
    K. Spärck Jones and P. Willett, editors, Readings in Information Retrieval, pages
    47–60. Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA, 1997.
 7. W. S. Cooper, F. C. Gey, and D. P. Dabney. Probabilistic Retrieval Based on
    Staged Logistic Regression. In N. J. Belkin, P. Ingwersen, A. Mark Pejtersen, and
    E. A. Fox, editors, Proc. 15th Annual International ACM SIGIR Conference on
    Research and Development in Information Retrieval (SIGIR 1992), pages 198–210.
    ACM Press, New York, USA, 1992.
 8. G. M. Di Nunzio and N. Ferro. Appendix D: Results of the Grid@CLEF Track.
    In Borri et al. [2].
 9. M. Eibl and J. Kürsten. The Importance of being Grid – Chemnitz University of
    Technology at Grid@CLEF. In Borri et al. [2].
10. N. Ferro. Specification of the CIRCO Framework, Version 0.10. Technical Re-
    port IMS.2009.CIRCO.0.10, Department of Information Engineering, University
    of Padua, Italy, 2009.
11. N. Ferro and D. Harman. Dealing with MultiLingual Information Access: Grid
    Experiments at TrebleCLEF. In M. Agosti, F. Esposito, and C. Thanos, editors,
    Post-proceedings of the Fourth Italian Research Conference on Digital Library Sys-
    tems (IRCDL 2008), pages 29–32. ISTI-CNR at Gruppo ALI, Pisa, Italy, 2008.
12. N. Ferro and C. Peters. From CLEF to TrebleCLEF: the Evolution of the Cross-
    Language Evaluation Forum. In N. Kando and M. Sugimoto, editors, Proc. 7th
    NTCIR Workshop Meeting on Evaluation of Information Access Technologies: In-
    formation Retrieval, Question Answering and Cross-Lingual Information Access,
    pages 577–593. National Institute of Informatics, Tokyo, Japan, 2008.
13. N. Ferro and C. Peters. CLEF Ad-hoc: A Perspective on the Evolution of the
    Cross-Language Evaluation Forum. In M. Agosti, F. Esposito, and C. Thanos,
    editors, Post-proceedings of the Fifth Italian Research Conference on Digital Li-
    brary Systems (IRCDL 2009), pages 72–79. DELOS Association and Department
    of Information Engineering of the University of Padua, 2009.
14. R. R. Larson.        Decomposing Text Processing for Retrieval: Cheshire tries
    GRID@CLEF. In Borri et al. [2].
15. C. Peters, M. Braschler, J. Gonzalo, and M. Kluck, editors. Evaluation of Cross-
    Language Information Retrieval Systems: Second Workshop of the Cross–Language
    Evaluation Forum (CLEF 2001) Revised Papers. Lecture Notes in Computer Sci-
    ence (LNCS) 2406, Springer, Heidelberg, Germany, 2002.
16. S. E. Robertson. The methodology of information retrieval experiment. In
    K. Spärck Jones, editor, Information Retrieval Experiment, pages 9–31. Butter-
    worths, London, United Kingdom, 1981.
17. S. E. Robertson and K. Spärck Jones. Relevance Weighting of Search Terms.
    Journal of the American Society for Information Science (JASIS), 27(3):129–146,
    May/June 1976.
18. S. E. Robertson, S. Walker, and M. Beaulieu. Experimentation as a way of life:
    Okapi at TREC. Information Processing & Management, 36(1):95–108, January
    2000.
19. G. Salton and C. Buckley. Term-weighting Approaches in Automatic Text Re-
    trieval. Information Processing & Management, 24(5):513–523, 1988.
20. G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic Index-
    ing. Communications of the ACM (CACM), 18(11):613–620, November 1975.
21. J. Savoy. A Stemming Procedure and Stopword List for General French Corpora.
    Journal of the American Society for Information Science (JASIS), 50(10):944–952,
    January 1999.
22. J. Savoy. Report on CLEF-2001 Experiments: Effective Combined Query-
    Translation Approach. In Peters et al. [15], pages 27–43.
23. W3C. XML Schema Part 1: Structures – W3C Recommendation 28 October 2004.
    http://www.w3.org/TR/xmlschema-1/, October 2004.
24. W3C. XML Schema Part 2: Datatypes – W3C Recommendation 28 October 2004.
    http://www.w3.org/TR/xmlschema-2/, October 2004.
25. W3C. Extensible Markup Language (XML) 1.0 (Fifth Edition) – W3C Recom-
    mendation 26 November 2008. http://www.w3.org/TR/xml/, November 2008.