<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio M. Di Nunzio</string-name>
          <email>dinunzio@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <email>ferro@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gareth J.F. Jones</string-name>
          <email>gjones@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carol Peters</string-name>
          <email>carol.peters@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Multilingual Information Access, Cross-Language Information Retrieval</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Experimentation</institution>
          ,
          <addr-line>Performance, Measurement, Algorithms</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ISTI-CNR, Area di Ricerca</institution>
          ,
          <addr-line>56124 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>School of Computing, Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2005</year>
      </pub-date>
      <abstract>
        <p>We describe the objectives and organization of the CLEF 2005 ad hoc track and discuss the main characteristics of the tasks offered to test monolingual, bilingual and multilingual textual document retrieval. The performance achieved for each task is presented and a preliminary analysis of results is given. The paper focuses in particular on the multilingual tasks which reused the test collection created in CLEF 2003 in an attempt to see if an improvement in system performance over time could be measured, and also to examine the multilingual results merging problem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In this paper we describe the track setup, the evaluation methodology and the participation in the different tasks
(Section 2), and present the main characteristics of the experiments and show the results (Sections 3 - 5). The
final section provides a brief summing up. For information on the various approaches and resources used by the
groups participating in this track and the issues they focused on, we refer the reader to the other papers in these</p>
      <sec id="sec-1-1">
        <title>Working Notes.</title>
        <p>2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Track Setup</title>
      <p>
        The ad hoc track in CLEF adopts a corpus-based, automatic scoring method for the assessment of system
performance, based on ideas first introduced in the Cranfield experiments [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in the late 1960s. The test
collection used consists of a set of “topics” describing information needs and a collection of documents to be
searched to find those documents that satisfy the information needs. Evaluation of system performance is then
done by judging the documents retrieved in response to a topic with respect to their relevance, and computing the
recall and precision measures. The distinguishing feature of CLEF is that it applies this evaluation paradigm in a
multilingual setting. This means that the criteria normally adopted to create a test collection, consisting of
suitable documents, sample queries and relevance assessments, have been adapted to satisfy the particular
requirements of the multilingual context. All language dependent tasks such as topic creation and relevance
judgment are performed in a distributed setting by native speakers. Rules are established and a tight central
coordination is maintained in order to ensure consistency and coherency of topic and relevance judgment sets
over the different collections, languages and tracks.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Test Collection</title>
        <p>This year, for the first time, separate test collections were used in the ad hoc track: the monolingual and bilingual
tasks were based on document collections in Bulgarian, English, French, Hungarian and Portuguese, whereas the
two multilingual tasks reused a test collection – documents, topics and relevance assessments - created in CLEF
2003.</p>
        <p>
          Documents: The document collections used for the CLEF 2005 ad hoc tasks are part of the CLEF multilingual
corpus of news documents described in the Introductory paper to these Working Notes [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In the monolingual
and bilingual tasks, the English, French and Portuguese collections consisted of national newspapers and news
agencies for the period 1994 and 1995. Different variants were used for each language. Thus, for English we had
both US and British newspapers, for French we had a national newspaper of France plus Swiss French news
agencies, and for Portuguese we had national newspapers from both Portugal and Brazil. This meant that, for
each language, there were significant differences in orthography and lexicon over the sub-collections. This is a
real world situation and system components, i.e. stemmers, translation resources, etc., should be sufficiently
robust to handle such variants. The Bulgarian and Hungarian collections used in these tasks were new in CLEF
2005 and consisted of national newspapers for the year 20021. This meant that the collections we used in the ad
hoc mono- and bilingual tasks this year were not all for the same time period. This had important consequences
on topic creation. For the multilingual tasks we reused the CLEF 2003 multilingual document collection. This
consisted of news documents for 1994-95 in the 8 languages listed above in the Introduction.
Topics: Topics in CLEF are structured statements representing information needs; the systems use the topics to
derive their queries. Each topic consists of three parts: a brief “title” statement; a one-sentence “description”; a
more complex “narrative” specifying the relevance assessment criteria. Sets of 50 topics were created for the
CLEF 2005 ad hoc mono- and bilingual tasks. One of the decisions taken early on in the organization of the
CLEF ad hoc tracks was that the same set of topics would be used to query all collections, whatever the task.
There are a number of reasons for this: it makes it easier to compare results over different collections, it means
that there is a single master set that is rendered in all query languages, and a single set of relevance assessments
for each language is sufficient for all tasks. However, the fact that the collections used in the CLEF 2005 ad hoc
mono- and bilingual tasks were from two different time periods (1994-1995 and 2002) made topic creation
particularly difficult. It was not possible to create time-dependent topics that referred to particular date-specific
events as all topics had to refer to events that could have been reported in any of the collections, regardless of the
dates. This meant that the CLEF 2005 topic set is somewhat different from the sets of previous years as the
topics tend to be of broad coverage. For this reason, it was difficult to construct topics that would find a limited
number of relevant documents in each collection, and a – probably excessive – number of topics used for the
        </p>
        <sec id="sec-2-1-1">
          <title>1 It proved impossible to find national newspapers in electronic form for 1994 and/or 1995 in these languages.</title>
          <p>2005 mono- and bilingual tasks have a very large number of relevant documents. We have yet to analyze the
possible impact of this fact on results calculation, but we suspect that it has meant that this year’s ad hoc test
collection is less effective in “discriminating” between the performance of different systems.
The topic sets for the mono- and bilingual tasks were prepared in thirteen languages: Amharic, Bulgarian,
Chinese, English, French, German, Greek, Hungarian, Indonesian, Italian, Portuguese, Russian, and Spanish.
Twelve were actually used and, as usual, English was by far the most popular. To counter this, in previous years,
we placed restrictions on the possible topic languages for the bilingual task. We will probably reinstate some
such constraint in CLEF 2006 in order to promote the testing of systems with less common languages.
For the multilingual task, the CLEF 2003 Dutch, English and Spanish sets of 60 topics were used. They were
divided into two sets: 20 topics for training and 40 for testing.</p>
          <p>Here below we give the English version of a typical topic from CLEF 2005:
&lt;top&gt;&lt;num&gt; C254 &lt;/num&gt;
&lt;EN-title&gt; Earthquake Damage &lt;/EN-title&gt;
&lt;EN-desc&gt; Find documents describing damage to property or persons caused by an
earthquake and specifying the area affected. &lt;/EN-desc&gt;
&lt;EN-narr&gt; Relevant documents will provide details on damage to buildings and
material goods or injuries to people as a result of an earthquake. The geographical
location (e.g. country, region, city) affected by the earthquake must also be
mentioned. &lt;/EN-narr&gt;&lt;/top&gt;
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Relevance Assessment</title>
        <p>Relevance assessment for the mono- and bilingual tasks was performed by native speakers. The multilingual
tasks used the relevance assessments of 2003. The practice of assessing the results on the basis of the longest,
most elaborate formulation of the topic (the narrative) means that only using shorter formulations (title and/or
description) implicitly assumes a particular interpretation of the user’s information need that is not (explicitly)
contained in the actual query that is run in the experiment. The fact that such additional interpretations are
possible has influence only on the absolute values of the evaluation measures, which in general are inherently
difficult to interpret. However, comparative results across systems are usually stable regardless of different
interpretations.</p>
        <p>
          The number of documents in large test collections such as CLEF makes it impractical to judge every
document for relevance. Instead approximate recall values are calculated using pooling techniques. The results
submitted by the participating groups were used to form a pool of documents for each topic and language by
collecting the highly ranked documents from all submissions. This pool was used for subsequent relevance
judgment. After calculating the effectiveness measures, the results were analyzed and run statistics produced and
distributed. A discussion of the results is given in Section 4. The individual results for all official ad hoc
experiments in CLEF 2005 are given in Appendix at the end of these Working Notes. The stability of pools
constructed in this way and their reliability for post-campaign experiments is discussed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] with respect to the
        </p>
        <sec id="sec-2-2-1">
          <title>CLEF 2003 pools.</title>
          <p>2.4</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Participation Guidelines</title>
        <p>To carry out the retrieval tasks of the CLEF campaign, systems have to build supporting data structures.
Allowable data structures include any new structures built automatically (such as inverted files, thesauri,
conceptual networks, etc.) or manually (such as thesauri, synonym lists, knowledge bases, rules, etc.) from the
documents. They may not, however, be modified in response to the topics, e.g. by adding topic words that are
not already in the dictionaries used by their systems in order to extend coverage.</p>
        <p>Some CLEF data collections contain manually assigned, controlled or uncontrolled index terms. The use of
such terms has been limited to specific experiments that have to be declared as “manual” runs.</p>
        <p>Topics can be converted into queries that a system can execute in many different ways. Participants
submitting more than one set of results have used both different query construction methods and variants within
the same method. CLEF strongly encourages groups to determine what constitutes a base run for their
experiments and to include these runs (officially or unofficially) to allow useful interpretations of the results.
Unofficial runs are those not submitted to CLEF but evaluated using the trec_eval package. This year we have
used the new package written by Chris Buckley for TREC (trec_eval 7.3) and available from the TREC website</p>
        <p>As a consequence of limited evaluation resources, a maximum of 4 runs for each multilingual task and a
maximum of 12 runs overall for the bilingual tasks, including all language combinations, was accepted. The
number of runs for the monolingual task was limited to 12 runs. No more than 4 runs were allowed for any
individual language combination. Overall, participants were allowed to submit at most 32 runs in total for the
multilingual, bilingual and monolingual tasks.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Result Calculation</title>
        <p>
          Evaluation campaigns such as TREC and CLEF are based on the belief that the effectiveness of IR systems can
be objectively evaluated by an analysis of a representative set of sample search results. For this, effectiveness
measures are calculated based on the results submitted by the participant and the relevance assessments. Popular
measures usually adopted for exercises of this type are Recall and Precision. Details on how they are calculated
for CLEF are given in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
2.6
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Participants and Experiments</title>
        <p>As shown in Table 1, a total of 23 groups from 15 different countries submitted results for one or more of the
Ad-hoc tasks - a slight decrease on the 26 participants of last year. A total of 254 experiments were submitted,
nearly the same as the 250 experiments of 2003. Thus, there is a slight increase in the average number of
submitted runs per participant: from 9.6 runs/participant of 2004 to 11 runs/participant of this year.
As stated, participants were required to submit at least one title+description (“TD”) run per task in order to
increase comparability between experiments. The large majority of runs (188 out of 254, 74.02%) used this
combination of topic fields, 54 (21.27%) used all fields, 10 (3.94%) used the title field, and only 2 (0.79%) used
the description field. The majority of experiments were conducted using automatic query construction. Manual
runs tend to be a resource-intensive undertaking and it is likely that most participants interested in this type of
work concentrated their efforts on the interactive track. A breakdown into the separate tasks is shown in Table 2.</p>
        <p>Thirteen different topic languages were used for ad hoc experiments– the Dutch run was in the multilingual
tasks and used the CLEF 2003 topics. As always, the most popular language for queries was English, and French
was second. Note that Bulgarian and Hungarian, the new collections added this year, were also quite popular as
new monolingual tasks – Hungarian was also used in one case a topic language in a bilingual run. The number of
runs per topic language is shown in Table 3.
As stated, monolingual retrieval was offered for the following target collections: Bulgarian, French, Hungarian,
and Portuguese. As can be seen from Table 2, the number of participants and runs for each language was quite
similar, with the exception of Bulgarian, which has a slightly smaller participation. This year just 5 groups out of
16 (31,25%) submitted monolingual runs only (down from ten groups last year), and just one of these groups
was a first time participant in CLEF. This is in contrast with previous years where many new groups only
participated in monolingual experiments. This year, most of the groups submitting monolingual runs were doing
this as part of their bilingual or multilingual system testing activity.</p>
        <p>Table 4 shows the top five groups for each target collection, ordered by mean average precision. The table
reports: the short name of the participating group; the run identifier, specifying whether the run has participated
in the pool or not, and the page in Appendix A containing all figures and graphs for this run; the mean average
precision achieved by the run; and the performance difference between the first and the last participant. The
pages of appendix A containing the overview graphs are indicated under the name of the sub-task. Table 4
regards runs using title + description fields only (the mandatory run).</p>
        <p>All the groups in the top five had participated in previous editions of CLEF. Both pooled and not pooled
runs are in the best entries for each track. Finally, it can be noticed that the trend observed in the previous
editions of CLEF is confirmed: differences for top performers for tracks with languages introduced in past
campaigns are small: in particular only 5.35% in the case of French (French monolingual has been offered in
CLEF since 2000) and 7.55% in the case of Portuguese, which was introduced last year. However, for the new
languages, Bulgarian and Hungarian, the differences are much greater, in the order of 25%, showing that there
should be room for improvement if these languages are offered in future campaigns.</p>
        <sec id="sec-2-5-1">
          <title>2 Throughout the paper, language names are sometimes shortened by using their ISO-639 2-letter equivalent.</title>
          <p>1st
jhu/apl
aplmobgd
pooled
(A.232)</p>
          <p>32.03%
jhu/apl
aplmofra
pooled
(A.261)</p>
          <p>42.14%
jhu/apl
aplmohud
pooled
(A.294)</p>
          <p>41.12%</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>Unine</title>
          <p>UniNEpt2
pooled
(A.338)
38.75%
2nd
hummingbird
humBG05tde
pooled
(A.230)</p>
          <p>29.18%
unine</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>UniNEfr1</title>
          <p>pooled
(A.278)</p>
          <p>42.07%
unine
UniNEhu3
not pooled
(A.312)</p>
          <p>38.89%
hummingbird
humPT05tde
not pooled
(A.322)
38.64%</p>
        </sec>
        <sec id="sec-2-5-4">
          <title>Unine</title>
        </sec>
        <sec id="sec-2-5-5">
          <title>UniNEbg3 not pooled (A.242)</title>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Participant Rank</title>
        <p>3rd 4th
miracle
ST
pooled
(A.235)</p>
        <p>1st
miracle</p>
        <sec id="sec-2-6-1">
          <title>ENXST</title>
          <p>pooled
(A.135)</p>
          <p>23.55%
alicante
IRn−enfr−vexp
not pooled
(A.158)</p>
          <p>35.90%
miracle</p>
        </sec>
        <sec id="sec-2-6-2">
          <title>ENMST</title>
          <p>not pooled
(A.190)</p>
          <p>30.16%
unine
UniNEbipt1
pooled
(A.216)</p>
          <p>34.04%
jhu/apl
aplbiidena
pooled
(A.152)
33.13%
2nd
unine</p>
        </sec>
        <sec id="sec-2-6-3">
          <title>UniNEbibg3 not pooled (A.143) 13.99%</title>
          <p>We can note that the top participant of the 2-Years-On track achieves a 15.89% performance improvement with
respect to the top participant of CLEF 2003 Multi-8. On the other hand, the fourth participant of the 2-Years-On
track has a 59.15% decrease in performance with respect to the fourth participant of CLEF 2003 Multi-8.
Similarly, we can note that the top participant of the Merging track achieves a 6.24% performance improvement
with respect to the top participant of 2003.</p>
          <p>In general, we can note that for the 2-Years-On track there is a performance improvement only for the top
participant, while the performances deteriorate quickly for the other participants with respect to 2003. On the
other hand, for the Merging track the performance improvement of the top participant with respect to 2003 is less
than in the case of the 2-Years-On track. . There is also less variation between the submissions for the Merging
task than seen in the earlier 2003 runs. This is probably due to the fact that the participants were using the same
ranked lists, and that the variation in performance arises only from the merging strategies adopted.</p>
          <p>1st
Cmu
adhocM5Trntes
not pooled
(A.93)
44.93%
+15.89%
41.19%
+6.24%</p>
        </sec>
        <sec id="sec-2-6-4">
          <title>UC Berkely</title>
          <p>bkmul8en3
pooled
5th</p>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>Diff.</title>
        <p>4th
isi-unige
AUTOEN
not pooled
(A.96)
From a first rapid glance at the reports from the groups that participated in the bilingual ad hoc tasks, it appears
that this year’s experiments provide a good overview of most of the traditional approaches to CLIR when
matching between query and target collection, including n-gram indexing, machine translation,
machinereadable bilingual dictionaries, multilingual ontologies, pivot languages, query and document translation –
perhaps corpus-based approaches were less used than in previous years continuing a trend first noticed in CLEF
2004. Veteran groups were mainly concerned with fine tuning and optimizing strategies already tried in previous
years. The issues examined were the usual ones: word-sense disambiguation, out-of-dictionary vocabulary, ways
to apply relevance feedback, results merging, etc.
100%
90%
80%
70%
60%
n
o
ii
s
c
e
rP 50%
e
g
a
r
e
v
A
40%
30%
20%
10%
0%
0%
100%
90%
80%
70%
60%
50%</p>
        <p>Interpolated Recal
10%
20%
30%
40%
60%
70%
80%
90%
100%
100</p>
        <p>Retrieved Documents (logarithmic scale)
10
15
20
30
200
500</p>
        <p>1000
100%
90%
80%
70%
60%
n
o
i
s
i
c
e
rP 50%
e
g
a
r
e
v
A
40%
30%
20%
10%
0%</p>
        <p>0%
100%
90%
80%
70%
60%</p>
        <p>50%</p>
        <p>Interpolated Recal
10%
20%
30%
40%
60%
70%
80%
90%
100%
100</p>
        <p>Retrieved Documents (logarithmic scale)
10
15
20
30
200
500</p>
        <p>
          1000
Although, as has already been mentioned, English was by far the most popular language for queries, some less
common and interesting query to target language pairs were tried, e.g. Amharic, Spanish and German to French,
and French to Portuguese. The track overview paper in the post-workshop proceedings will provide a more in
depth analysis of the approaches adopted for these tasks in CLEF 2005. One of the objectives will be to see if the
hypothesis concerning a “blueprint for a successful CLIR system” proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] can be confirmed.
        </p>
        <p>A main focus in the monolingual tasks was the development of new or the adaptation of existing stemmers
and/or morphological analysers for the “new” CLEF languages: Bulgarian and Hungarian. Any comments on the
outcomes?</p>
        <p>The multilingual tasks at CLEF 2005 were intended to assess whether re-use of the CLEF 2003 Multi-8 task
data could be used as an indication of progress in multilingual information retrieval and to provide common sets
of ranked lists to enable specific exploration of merging strategies for multilingual IR. The submissions to these
tasks show that multilingual performance can indeed be improved beyond that reported at CLEF 2003 both when
performing the complete retrieval process and when merging ranked result lists generated by other groups. The
initial running of this task suggests that there is scope for further improvement in multilingual IR from exploiting
ongoing improvements in IR methods, but also from focused exploration of merging techniques.</p>
        <p>Encouraged by the results of the multilingual tasks, we are currently considering running a similar
X-yearson task for the mono- and/or bilingual experiments in CLEF 2006, again with the aim of seeing if it is possible to
measure progress over time by testing new or updated systems against existing test collections.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cleverdon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The Cranfield Tests on Index Language Devices</article-title>
          . In: Sparck-Jones,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Willett</surname>
          </string-name>
          , P. (eds.): Readings in Information Retrieval, Morgan Kaufmann (
          <year>1997</year>
          )
          <fpage>47</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>What happened in CLEF 2005? In this volume</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Braschler</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CLEF 2003 - Overview of results</article-title>
          .
          <source>In: Fourth Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2003</year>
          , Trondheim, Norway,
          <year>2003</year>
          .
          <source>Revised papers. Lecture Notes in Computer Science 3237, Springer</source>
          <year>2004</year>
          ,
          <volume>44</volume>
          -
          <fpage>63</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Braschler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>CLEF 2003 Methodology and Metrics</article-title>
          .
          <source>In: Fourth Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2003</year>
          , Trondheim, Norway,
          <year>2003</year>
          .
          <source>Revised papers. Lecture Notes in Computer Science 3237, Springer</source>
          <year>2004</year>
          ,
          <volume>7</volume>
          -
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>The Impact of Evaluation on Multilingual Text Retrieval</article-title>
          ,
          <source>Proc. SIGIR</source>
          <year>2005</year>
          ,
          <volume>603</volume>
          -
          <fpage>604</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Braschler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Cross-Language Evaluation Forum: Objectives, Results</article-title>
          , Achievements, Information Retrieval, Vol.
          <volume>7</volume>
          (
          <issue>1-2</issue>
          ), pp.
          <fpage>5</fpage>
          -
          <lpage>29</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>