<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF 2009: Grid@CLEF Pilot Track Overview</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <email>ferro@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donna Harman</string-name>
          <email>donna.harman@nist.gov</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Standards and Technology (NIST)</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <abstract>
        <p>The Grid@CLEF track is a long term activity with the aim of running a series of systematic experiments in order to improve the comprehension of MLIA systems and gain an exhaustive picture of their behaviour with respect to languages. In particular, Grid@CLEF 2009 is a pilot track that has started to move the first steps in this direction by giving the participants the possibility of getting experienced with the new way of carrying out experimentation that is needed in Grid@CLEF to test all the different combinations of IR components and languages. Grid@CLEF 2009 offered traditional monolingual ad-hoc tasks in 5 different languages (Dutch, English, French, German, and Italian) which make use of consolidated and very well known collections from CLEF 2001 and 2002 and used a set of 84 topics. Participants had to conduct experiments according to the CIRCO framework, an XML-based protocol which allows for a distributed, looselycoupled, and asynchronous experimental evaluation of IR systems. We provided a Java library which can be exploited to implement CIRCO and an example implementation with the Lucene IR system. The participation has been especially challenging also for the size of the XML files generated by CIRCO, which can become 50-60 times the size of the collection. Of the 9 initially subscribed participants, only 2 were able to submit runs in time and we received a total of 18 runs in 3 languages (English, French, and German) out of the 5 offered. The two participants used different IR systems or combination of them, namely Lucene, Terrier, and Cheshire II.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Much of the effort of Cross-Language Evaluation Forum (CLEF) over the years
has been devoted to the investigation of key questions such as “What is Cross
Language Information Retrieval (CLIR)?”, “What areas should it cover?” and
“What resources, tools and technologies are needed?” In this respect, the Ad
Hoc track has always been considered as the core track in CLEF and it has been
the starting point for many groups as they begin to be interested in developing
functionality for the multilingual information access. Thanks to this pioneering
work, CLEF produced, over the years, the necessary groundwork and foundations
to be able, today, to start wondering how to go deeper and to address even more
challenging issues [
        <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
        ].
      </p>
      <p>
        The Grid@CLEF Pilot track1 moves the first steps in this direction and aims
at [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]:
– looking at differences across a wide set of languages;
– identifying best practices for each language;
– helping other countries to develop their expertise in the Information Retrieval
(IR) field and create IR groups;
– providing a repository, in which all the information and knowledge derived
from the experiments undertaken can be managed and made available via
the Distributed Information Retrieval Evaluation Campaign Tool (DIRECT)
system.
      </p>
      <p>The Grid@CLEF pilot track in CLEF 2009 has provided us with an
opportunity to begin to set up a suitable framework in order to carry out a first set
of experiments which allows us to acquire an initial set of measurements and
to start to explore the interaction among IR components and languages. This
initial knowledge will allow us to tune the overall protocol and framework, to
understand what directions are more promising, and to scale the experiments
up to a finer-grain comprehension of the behaviour of IR components across
languages.</p>
      <p>The paper is organized as follows: Section 2 provides an overview of the
approach and the issues that need to be faced in Grid@CLEF; Section 3 introduces
CIRCO, the framework we are developing in order to enable the Grid@CLEF
experiments; Section 4 describes the experimental setup that has been adopted for
Grid@CLEF 2009; Section 5 presents the main outcomes of this year Grid@CLEF
in terms of participation and performances achieved; finally, Section 6 discusses
the different approached and findings of the participants in Grid@CLEF.
2
Individual researchers or small groups do not usually have the possibility of
running large-scale and systematic experiments over a large set of experimental
collections and resources. Figure 1 depicts the performances, e.g. mean average
precision, of the composition of different IR components across a set of languages
as a kind of surface area which we intend to explore with our experiment. The
average CLEF participants, shown in Figure 1(a), may only be able to sample
a few points on this surface since, for example, they usually test just a few
variations of their own or customary IR model with a stemmer for two or three
languages. Instead, the expert CLEF participant, represented in Figure 1(b),
may have the expertise and competence to test all the possible variations of
a given component across a set of languages, as [22] does for stemmers, thus
investigating a good slice of the surface area.</p>
      <p>However, even though each of these cases produces valuable research results
and contributes to the advancement of the discipline, they are both still far</p>
    </sec>
    <sec id="sec-2">
      <title>1 http://ims.dei.unipd.it/gridclef/</title>
      <p>Swedish</p>
      <p>Spanish</p>
      <p>Russian</p>
      <p>Portuguese
0.9</p>
      <p>Language</p>
      <p>MergingStrategies</p>
      <p>PivotLanguage</p>
      <p>WordSenseDisambiguation</p>
      <p>AlignedCorpora</p>
      <p>ParalelCorpora</p>
      <p>MachineTranslation</p>
      <p>MachineReadableDictionaty</p>
      <p>Post−translationRelevanceFeedback</p>
      <p>Pre−translationRelevanceFeedback</p>
      <p>DivergenceFromRandomness</p>
      <p>LanguageModels</p>
      <p>ProbabilisticModel</p>
      <p>VectorSpaceModel</p>
      <p>BooleanModel</p>
      <p>Wordde−compounder</p>
      <p>Stemmer
StopList
removed from a clear and complete comprehension of the features and properties
of the surface. A far deeper sampling would be needed for this, as shown in
Figure 2: in this sense, Grid@CLEF will create a fine-grained grid of points over
this surface and, hence, the name of the track comes.</p>
      <p>It is our hypothesis that a series of systematic experiments can re-use and
exploit the valuable resources and experimental collections made available by
CLEF in order to gain more insights about the effectiveness of, for example, the
various weighting schemes and retrieval techniques with respect to the languages.</p>
      <p>In order to do this, we must deal with the interaction of three main entities:
– Component: in charge of carrying out one of the steps of the IR process;
– Language: will affect the performance and behaviour of the different
components of an Information Retrieval System (IRS) depending on its specific
features, e.g. alphabet, morphology, syntax, and so on.
– Task: will impact on the performances of IRS components according to its
distinctive characteristics;</p>
      <p>We assume that the contributions of these three main entities to retrieval
performance tend to overlap; nevertheless, at present, we do not have enough
knowledge about this process to say whether, how, and to what extent these
entities interact and/or overlap – and how their contributions can be combined,
e.g. in a linear fashion or according to some more complex relation.</p>
      <p>The above issue is in direct relationship with another long-standing problem
in the IR experimentation: the impossibility of testing a single component
independently of a complete IRS. [16, p. 12] points out that “if we want to decide
between alternative indexing strategies for example, we must use these
strategies as part of a complete information retrieval system, and examine its overall
performance (with each of the alternatives) directly”. This means that we have
to proceed by changing only one component at time and keeping all the others
fixed, in order to identify the impact of that component on retrieval effectiveness;
this also calls for the identification of suitable baselines with respect to which
comparisons can be made.
3</p>
      <sec id="sec-2-1">
        <title>The CIRCO Framework</title>
        <p>In order to run these grid experiments, we need to set up a framework in which
participants can exchange the intermediate output of the components of their
systems and create a run by using the output of the components of other
participants.</p>
        <p>For example, if the expertise of participant A is in building stemmers and
decompounders while participant B’s expertise is in developing probabilistic IR
models, we would like to make it possible for participant A to apply his
stemmer to a document collection, pass the output to participant B, who tests his
probabilistic IR model, thus obtaining a final run which represents the test of
participant A’ stemmer + participant B probabilistic IR model.</p>
        <p>
          To this end, the objective of the Coordinated Information Retrieval
Components Orchestration (CIRCO) framework [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is to allow for a distributed,
looselycoupled, and asynchronous experimental evaluation of Information Retrieval (IR)
systems where:
– distributed highlights that different stakeholders can take part to the
experimentation each one providing one or more components of the whole IR
system to be evaluated;
– loosely-coupled points out that minimal integration among the different
components is required to carry out the experimentation;
– asynchronous underlines that no synchronization among the different
components is required to carry out the experimentation.
        </p>
        <p>The CIRCO framework allows different research groups and industrial
parties, each one with their own areas of expertise, to take part in the creation of
collaborative experiments. This is a radical departure from today’s IR
evaluation practice where each stakeholder has to develop (or integrate components to
build) an entire IR system to be able to run a single experiment.</p>
        <p>The base idea – and assumption – behind CIRCO to streamline the
architecture of an IR system and represent it as a pipeline of components chained
together. The processing proceeds by passing the results of the computations of
a component as input to the next component in the pipeline without branches,
i.e. no alternative paths are allowed in the chain.</p>
        <p>To get an intuitive idea of the overall approach adopted in CIRCO, consider
the example pipeline shown in Figure 3(a).</p>
        <p>The example IR system is constituted by the following components:
– tokenizer : breaks the input documents into a sequence of tokens;
– stop word remover : removes stop words from the sequence of tokens;
– stemmer : stems the tokens;
– indexer : weights the tokens and stores them and the related information in
an index.
(a) An example pipeline for an IR system.</p>
        <p>Stop Word</p>
        <p>Remover
Tokenizer</p>
        <p>Stemmer</p>
        <p>Indexex
(b) An example of CIRCO pipeline for an IR system.</p>
        <p>Instead of directly feeding the next component as usually happens in an
IR system, CIRCO operates by requiring each component to input and output
from/to eXtensible Markup Language (XML) [25] files in a well-defined format,
as shown in Figure 3(b).</p>
        <p>These XML files can then be exchanged among the different stakeholders
that are involved in the evaluation. In this way, we can meet the requirements
stated above by allowing for an experimentation that is:
– distributed since different stakeholders can take part in the same experiment,
each one providing his own component(s);
– loosely-coupled since the different components do not need to be integrated
into a whole and running IR system but only need to communicate by means
of a well-defined XML format;
– asynchronous since the different components do not need to operate all at
the same time or immediately after the previous one but can exchange and
process the XML files at different rates.</p>
        <p>In order to allow this way of conducting experiments, the CIRCO framework
consists of:
– CIRCO Schema: an XML Schema [23,24] model which precisely defines the
format of the XML files exchanged among stakeholders’ components;
– CIRCO Web: an online system which manages the registration of
stakeholders’ components, their description, and the exchange of XML message;
– CIRCO Java2: an implementation of CIRCO based on the Java3
programming language to facilitate its adoption and portability.
2 The documentation is available at the following address: http://ims.dei.unipd.
it/software/circo/apidoc/.</p>
        <p>The source code and the binary code are available at the following address: http:
//ims.dei.unipd.it/software/circo/jar/.
3 http://java.sun.com/</p>
        <p>The choice of using an XML-based exchange format is due to the fact that the
main other possibility, i.e. to develop a common Application Program Interface
(API) IR systems have to comply with, presents some issues:
– the experimentation would not be loosely-coupled, since all the IR systems
would have to be coded with respect to the same API;
– much more complicated solutions would be required for allowing the
distributed and asynchronous running of the experiments, since you would need
some kind of middleware for process orchestration and message delivery;
– multiple versions of the API in different languages should be provided to
take into account the different technologies used to develop IR system;
– the integration with legacy code could be problematic and require a lot of
effort;
– overall, stakeholders would be distracted from their main objective, which is
running an experiment and evaluating a system.
4</p>
      </sec>
      <sec id="sec-2-2">
        <title>Track Setup</title>
        <p>
          The Grid@CLEF tracks offers a traditional ad-hoc task – see, for example, [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] –
which makes use of experimental collections developed according to the Cranfield
paradigm [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This first year task focuses on monolingual retrieval, i.e. querying
topics against documents in the same language of the topics, in five European
languages:
– Dutch;
– English;
– French;
– German;
– Italian.
        </p>
        <p>The selected languages allow participants to test both romance and germanic
languages, as well as languages with word compounding issues. These languages
have been extensively studied in the MultiLingual Information Access (MLIA)
field and, therefore, it will be possible to compare and assess the outcomes of
the first year experiments with respect to the existing literature.</p>
        <p>This first year track has a twofold goal:
1. to prepare participants’ systems to work according to CIRCO framework;
2. to conduct as many experiments as possible, i.e. to put as many dots as
possible on the grid.
4.1</p>
        <sec id="sec-2-2-1">
          <title>Test Collections</title>
          <p>
            Grid@CLEF 2009 used the test collection originally developed for the CLEF
2001 and 2002 campaigns [
            <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
            ].
          </p>
          <p>&lt;?xml version="1.0" encoding="UTF-8" standalone="no"?&gt;
&lt;topic&gt;
&lt;identifier&gt;10.2452/125-AH&lt;/identifier&gt;
&lt;title lang="nl"&gt;Gemeenschapplijke Europese munt.&lt;/title&gt;
&lt;title lang="en"&gt;European single currency&lt;/title&gt;
&lt;title lang="fr"&gt;La monnaie unique européenne&lt;/title&gt;
&lt;title lang="de"&gt;Europäische Einheitswährung&lt;/title&gt;
&lt;title lang="it"&gt;La moneta unica europea&lt;/title&gt;
&lt;description lang="nl"&gt;</p>
          <p>Wat is het geplande tijdschema voor de invoering van de gemeenschapplijke Europese
munt?
&lt;/description&gt;
&lt;description lang="en"&gt;</p>
          <p>What is the schedule predicted for the European single currency?
&lt;/description&gt;
&lt;description lang="fr"&gt;</p>
          <p>Quelles sont les prévisions pour la mise en place de la monnaie unique européenne?
&lt;/description&gt;
&lt;description lang="de"&gt;</p>
          <p>Wie sieht der Zeitplan für die Einführung einer europäischen Einheitswährung aus?
&lt;/description&gt;
&lt;description lang="it"&gt;</p>
          <p>Qual è il calendario previsto per la moneta unica europea?
&lt;/description&gt;
“description”; a more complex “narrative” specifying the relevance assessment
criteria. Topics are prepared in xml format and uniquely identified by means of
a Digital Object Identifier (DOI)4.</p>
          <p>In Grid@CLEF 2009, we used 84 out of 100 topics in the set 10.2452/41-AH–
10.2452/140-AH originally developed for CLEF 2001 and 2002 since they have
relevant documents in all the collections of Table 1, as detailed in Table 2.</p>
          <p>English</p>
          <p>Los Angeles Times 1994
Language
Dutch
French
German
Italian</p>
          <p>NRC Handelsblad 1994/95
Algemeen Dagblad 1994/95
Le Monde 1994
French SDA 1994
Frankfurter Rundschau 1994
Der Spiegel 1994/95
German SDA 1994
La Stampa 1994
Italian SDA 1994
No relevant documents for topic 10.2452/54-AH
No relevant documents for topic 10.2452/57-AH
No relevant documents for topic 10.2452/60-AH
No relevant documents for topic 10.2452/93-AH
No relevant documents for topic 10.2452/96-AH
No relevant documents for topic 10.2452/101-AH
No relevant documents for topic 10.2452/110-AH
No relevant documents for topic 10.2452/117-AH
No relevant documents for topic 10.2452/118-AH
No relevant documents for topic 10.2452/127-AH</p>
          <p>No relevant documents for topic 10.2452/132-AH
French</p>
          <p>No relevant documents for topic 10.2452/64-AH
German No relevant documents for topic 10.2452/44-AH
Italian</p>
          <p>No relevant documents for topic 10.2452/43-AH
No relevant documents for topic 10.2452/52-AH
No relevant documents for topic 10.2452/64-AH</p>
          <p>No relevant documents for topic 10.2452/120-AH</p>
          <p>Participant
chemnitz
cheshire</p>
          <p>Institution Country
Chemnitz University of Technology Germany</p>
          <p>
            U.C.Berkeley United States
Relevance Assessment The same relevance assessment developed for CLEF
2001 and 2002 have been used; for further information see [
            <xref ref-type="bibr" rid="ref3 ref4">3,4</xref>
            ].
4.2
          </p>
        </sec>
        <sec id="sec-2-2-2">
          <title>Result Calculation</title>
          <p>
            Evaluation campaigns such as TREC and CLEF are based on the belief that the
effectiveness of IRSs can be objectively evaluated by an analysis of a
representative set of sample search results. For this, effectiveness measures are calculated
based on the results submitted by the participants and the relevance
assessments. Popular measures usually adopted for exercises of this type are Recall
and Precision. Details on how they are calculated for CLEF are given in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. We
used trec eval5 8.0 to compute the performance measures.
          </p>
          <p>
            The individual results for all official Grid@CLEF experiments in CLEF 2009
are given in the Appendices of the CLEF 2009 Working Notes [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. You can also
access them online at:
– monolingual English: http://direct.dei.unipd.it/DOIResolver.do?type=
task&amp;id=GRIDCLEF-MONO-EN-CLEF2009
– monolingual French: http://direct.dei.unipd.it/DOIResolver.do?type=
task&amp;id=GRIDCLEF-MONO-FR-CLEF2009
– monolingual German: http://direct.dei.unipd.it/DOIResolver.do?type=
task&amp;id=GRIDCLEF-MONO-DE-CLEF2009
5
5.1
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Track Outcomes</title>
        <sec id="sec-2-3-1">
          <title>Participants and Experiments</title>
          <p>As shown in Table 3, a total of 2 groups from 2 different countries submitted
official results for one or more of the Grid@CLEF 2009 tasks.</p>
          <p>Participants were required to submit at least one title+description (“TD”)
run per task in order to increase comparability between experiments: all the
18 submitted runs used this combination of topic fields. A breakdown into the
separate tasks is shown in Table 4.</p>
          <p>The participation in this first year was especially challenging because of the
need of modifying existing systems to implement the CIRCO framework.
Moreover, it has been challenging also from the computational point of view since,
for each component in a IR pipeline, CIRCO could produce XML files that are
50-60 times the size of the original collection; this greatly increased the indexing
time and the time needed to submit runs and deliver the corresponding XML
files.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 http://trec.nist.gov/trec_eval</title>
      <p>
        Chemnitz [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] approached the participation in Grid@CLEF into the wider
context of the creation of an archive of audiovisual media which can be jointly
used by German TV stations, stores both raw material as well as produced and
broadcasted material and needs to be described as comprehensively as possible
in order to be easily searchable. In this context, they have developed the Xtrieval
system, which aims to be flexible and easily configurable in order to be adjusted
to different corpora, multimedia search tasks, and annotation kinds. Chemnitz
tested both the vector space model [20,19], as implemented by Lucene6 and
      </p>
    </sec>
    <sec id="sec-4">
      <title>6 http://lucene.apache.org/</title>
      <p>Track Rank Participant</p>
      <p>1st chemnitz
English 2nd chesire</p>
      <p>Difference</p>
      <p>1st chesire
French 2nd chemnitz</p>
      <p>Difference</p>
      <p>1st chemnitz
German 2nd chesire</p>
      <p>Difference</p>
      <p>Experiment DOI</p>
      <p>MAP
10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHEMNITZ.CUT GRID MONO EN MERGED LUCENE TERRIER 54.45%
10.2415/GRIDCLEF-MONO-EN-CLEF2009.CHESHIRE.CHESHIRE GRID ENG T2FB 53.13%
2.48%
10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHESHIRE.CHESHIRE GRID FRE T2FB 51.88%
10.2415/GRIDCLEF-MONO-FR-CLEF2009.CHEMNITZ.CUT GRID MONO FR MERGED LUCENE TERRIER 49.42%
4.97%
10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHEMNITZ.CUT GRID MONO DE MERGED LUCENE TERRIER 48.64%
10.2415/GRIDCLEF-MONO-DE-CLEF2009.CHESHIRE.CHESHIRE GRID GER T2FB 40.02%
21.53%
50%</p>
      <p>Recal
10%
20%
30%
40%
60%
70%
80%
90%</p>
      <p>100%
(a) Monolingual English
Grid@CLEF 2009 Monolingual French Top 5 Participants − Standard Recal Levels vs Mean Interpolated Precision
100%
cheshire [Experiment CHESHIRE_GRID_FRE_T2FB; MAP 51.88%; Not Pooled]
chemnitz [Experiment CUT_GRID_MONO_FR_MERGED_LUCENE_TERRIER; MAP 49.42%; Not Pooled]
90%
Grid@CLEF 2009 Monolingual English Top 5 Participants − Standard Recal Levels vs Mean Interpolated Precision
100%
chemnitz [Experiment CUT_GRID_MONO_EN_MERGED_LUCENE_TERRIER; MAP 54.46%; Not Pooled]
cheshire [Experiment CHESHIRE_GRID_ENG_T2FB; MAP 53.13%; Not Pooled]
90%
80%
70%
60%
n
o
i
ics 50%
e
r
P
40%
30%
20%
10%
0%</p>
      <p>0%
n
o
i
ics 50%
e
r
P
80%
70%
60%
40%
30%
20%
10%
0%</p>
      <p>0%
n
o
i
ics 50%
e
r
P
80%
70%
60%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
60%
70%
80%
90%</p>
      <p>100%
50%
Recal
50%</p>
      <p>Recal
(b) Monolingual French
Grid@CLEF 2009 Monolingual German Top 5 Participants − Standard Recal Levels vs Mean Interpolated Precision
100%
chemnitz [Experiment CUT_GRID_MONO_DE_MERGED_LUCENE_TERRIER; MAP 48.64%; Not Pooled]
cheshire [Experiment CHESHIRE_GRID_GER_T2FB; MAP 40.03%; Not Pooled]
90%
10%
20%
30%
40%
60%
70%
80%
90%</p>
      <p>100%
(c) Monolingual German</p>
      <p>Fig. 5. Recall-precision graph for Grid@CLEF tasks.</p>
      <p>BM25 [17,18], as implemented by Terrier7, in combination with Snowball8 and
Savoy’s [21] stemmers. They found out that the impact of retrieval techniques are
highly depending on the corpus and quite unpredictable and that, even if over
they years they have learned how to guess reasonable configurations for their
system in order to get good results, there is still the need of “strong rules which
let us predict the retrieval quality . . . [and] enable us to automatically configure
a retrieval engine in accordance to the corpus”. This was for them motivation
to participate in Grid@CLEF 2009, which represented a first attempt that will
allow them to go also in this direction.</p>
      <p>
        Chesire [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] participated in Grid@CLEF with their Chesire II system based
on logistic regression [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and their interest was in understanding what happens
when you try to separate the processing elements of IR systems and look at their
intermediate output, taking this as an opportunity to re-analyse and improve
their system, and, possibly, finding a way to incorporate into Chesire II
components of other IR systems for subtasks in which they currently cannot do or
cannot do effectively, such as decompounding German words. They also found
that “the same algorithms and processing systems can have radically different
performance on different collections and query sets”. Finally, the participation in
Grid@CLEF actually allowed Cheshire to improve their system and to point out
some suggestions for the next Grid@CLEF, concerning the support for the
creation of multiple indexes according to the structure of a document and specific
indexing tasks related to the geographic information retrieval, such as geographic
names extraction and geo-referencing.
      </p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>The authors would like to warmly thank the members of the Grid@CLEF
Advisory Committee – Martin Braschler, Chris Buckley, Fredric Gey, Kalervo Ja¨rvelin,
Noriko Kando, Craig Macdonald, Prasenjit Majumder, Paul McNamee, Teruko
Mitamura, Mandar Mitra, Stephen Robertson, and Jacques Savoy – for the
useful discussions and suggestions.</p>
        <p>The work reported has been partially supported by the TrebleCLEF
Coordination Action, within FP7 of the European Commission, Theme ICT-1-4-1
Digital Libraries and Technology Enhanced Learning (Contract 215231).
7 http://ir.dcs.gla.ac.uk/terrier/index.html
8 http://snowball.tartarus.org/
17. S. E. Robertson and K. Spa¨rck Jones. Relevance Weighting of Search Terms.</p>
        <p>Journal of the American Society for Information Science (JASIS), 27(3):129–146,
May/June 1976.
18. S. E. Robertson, S. Walker, and M. Beaulieu. Experimentation as a way of life:
Okapi at TREC. Information Processing &amp; Management, 36(1):95–108, January
2000.
19. G. Salton and C. Buckley. Term-weighting Approaches in Automatic Text
Retrieval. Information Processing &amp; Management, 24(5):513–523, 1988.
20. G. Salton, A. Wong, and C. S. Yang. A Vector Space Model for Automatic
Indexing. Communications of the ACM (CACM), 18(11):613–620, November 1975.
21. J. Savoy. A Stemming Procedure and Stopword List for General French Corpora.</p>
        <p>Journal of the American Society for Information Science (JASIS), 50(10):944–952,
January 1999.
22. J. Savoy. Report on CLEF-2001 Experiments: Effective Combined
Query</p>
        <p>
          Translation Approach. In Peters et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], pages 27–43.
23. W3C. XML Schema Part 1: Structures – W3C Recommendation 28 October 2004.
        </p>
        <p>http://www.w3.org/TR/xmlschema-1/, October 2004.
24. W3C. XML Schema Part 2: Datatypes – W3C Recommendation 28 October 2004.</p>
        <p>http://www.w3.org/TR/xmlschema-2/, October 2004.
25. W3C. Extensible Markup Language (XML) 1.0 (Fifth Edition) – W3C
Recommendation 26 November 2008. http://www.w3.org/TR/xml/, November 2008.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Di Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          , and
          <string-name>
            <surname>C. Peters. CLEF</surname>
          </string-name>
          <year>2008</year>
          :
          <article-title>Ad Hoc Track Overview</article-title>
          . In F. Borri,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and C. Peters, editors,
          <source>Working Notes for the CLEF 2008 Workshop</source>
          . http://www.clef-campaign.org/2008/working_ notes/adhoc-final.
          <source>pdf [last visited</source>
          <year>2008</year>
          , September 10],
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Borri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and C. Peters, editors.
          <source>Working Notes for the CLEF 2009 Workshop. Published Online</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          .
          <article-title>CLEF 2001 - Overview of Results</article-title>
          . In Peters et al. [
          <volume>15</volume>
          ], pages
          <fpage>9</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          .
          <article-title>CLEF 2002 - Overview of Results</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Advances in Cross-Language Information Retrieval: Third Workshop of the Cross-Language Evaluation Forum (CLEF</source>
          <year>2002</year>
          )
          <article-title>Revised Papers</article-title>
          , pages
          <fpage>9</fpage>
          -
          <lpage>27</lpage>
          . Lecture Notes in Computer Science (LNCS) 2785, Springer, Heidelberg, Germany,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          .
          <article-title>CLEF 2003 Methodology and Metrics</article-title>
          . In C. Peters,
          <string-name>
            <given-names>M.</given-names>
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors,
          <source>Comparative Evaluation of Multilingual Information Access Systems: Fourth Workshop of the Cross-Language Evaluation Forum (CLEF 2003) Revised Selected Papers</source>
          , pages
          <fpage>7</fpage>
          -
          <lpage>20</lpage>
          . Lecture Notes in Computer Science (LNCS) 3237, Springer, Heidelberg, Germany,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Cleverdon</surname>
          </string-name>
          .
          <article-title>The Cranfield Tests on Index Languages Devices</article-title>
          . In K. Sp¨arck Jones and P. Willett, editors,
          <source>Readings in Information Retrieval</source>
          , pages
          <fpage>47</fpage>
          -
          <lpage>60</lpage>
          . Morgan Kaufmann Publisher, Inc., San Francisco, CA, USA,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Gey</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Dabney</surname>
          </string-name>
          .
          <article-title>Probabilistic Retrieval Based on Staged Logistic Regression</article-title>
          . In N. J.
          <string-name>
            <surname>Belkin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ingwersen</surname>
            ,
            <given-names>A. Mark</given-names>
          </string-name>
          <string-name>
            <surname>Pejtersen</surname>
          </string-name>
          , and E. A. Fox, editors,
          <source>Proc. 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>1992</year>
          ), pages
          <fpage>198</fpage>
          -
          <lpage>210</lpage>
          . ACM Press, New York, USA,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>G. M. Di Nunzio</surname>
            and
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
          </string-name>
          . Appendix D:
          <article-title>Results of the Grid@CLEF Track</article-title>
          . In Borri et al. [
          <volume>2</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ku</surname>
          </string-name>
          <article-title>¨rsten</article-title>
          . The Importance of being Grid - Chemnitz University of Technology at Grid@CLEF. In Borri et al. [
          <volume>2</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          .
          <source>Specification of the CIRCO Framework, Version</source>
          <volume>0</volume>
          .
          <fpage>10</fpage>
          .
          <source>Technical Report IMS</source>
          .
          <year>2009</year>
          .CIRCO.
          <volume>0</volume>
          .10, Department of Information Engineering, University of Padua, Italy,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Harman</surname>
          </string-name>
          .
          <article-title>Dealing with MultiLingual Information Access: Grid Experiments at TrebleCLEF</article-title>
          . In M. Agosti,
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          , and C. Thanos, editors,
          <source>Post-proceedings of the Fourth Italian Research Conference on Digital Library Systems (IRCDL</source>
          <year>2008</year>
          ), pages
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          .
          <string-name>
            <surname>ISTI-CNR at Gruppo</surname>
            <given-names>ALI</given-names>
          </string-name>
          , Pisa, Italy,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          .
          <article-title>From CLEF to TrebleCLEF: the Evolution of the CrossLanguage Evaluation Forum</article-title>
          . In N. Kando and M. Sugimoto, editors,
          <source>Proc. 7th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access</source>
          , pages
          <fpage>577</fpage>
          -
          <lpage>593</lpage>
          . National Institute of Informatics, Tokyo, Japan,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Peters</surname>
          </string-name>
          .
          <article-title>CLEF Ad-hoc: A Perspective on the Evolution of the Cross-Language Evaluation Forum</article-title>
          . In M. Agosti,
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          , and C. Thanos, editors,
          <source>Post-proceedings of the Fifth Italian Research Conference on Digital Library Systems (IRCDL</source>
          <year>2009</year>
          ), pages
          <fpage>72</fpage>
          -
          <lpage>79</lpage>
          . DELOS Association and Department of Information Engineering of the University of Padua,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Larson</surname>
          </string-name>
          .
          <article-title>Decomposing Text Processing for Retrieval: Cheshire tries GRID@CLEF</article-title>
          . In Borri et al. [
          <volume>2</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>C. Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Braschler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
          </string-name>
          , and M. Kluck, editors.
          <source>Evaluation of CrossLanguage Information Retrieval Systems: Second Workshop of the Cross-Language Evaluation Forum (CLEF 2001) Revised Papers. Lecture Notes in Computer Science (LNCS) 2406</source>
          , Springer, Heidelberg, Germany,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>S. E. Robertson.</surname>
          </string-name>
          <article-title>The methodology of information retrieval experiment</article-title>
          . In K. Sp¨arck Jones, editor,
          <source>Information Retrieval Experiment</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>31</lpage>
          . Butterworths, London, United Kingdom,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>