<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Introduction to the Working Notes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Italy carol.peters@isti.cnr.it</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2007</year>
      </pub-date>
      <abstract>
        <p>The objective of the Cross Language Evaluation Forum1 is to promote research in the field of multilingual system development. This is done through the organisation of annual evaluation campaigns in which a series of tracks designed to test different aspects of mono- and cross-language information retrieval (IR) are offered. The intention is to encourage experimentation with all kinds of multilingual information access - from the development of systems for monolingual retrieval operating on many languages to the implementation of complete multilingual multimedia search services. This has been achieved by offering an increasingly complex and varied set of evaluation tasks over the years. The aim is not only to meet but also to anticipate the emerging needs of the R&amp;D community and to encourage the development of next generation multilingual IR systems. These Working Notes contain descriptions of the experiments conducted within CLEF 2007 - the eighth in a series of annual system evaluation campaigns. The results of the experiments will be presented and discussed in the CLEF 2007 Workshop, 19-21 September, Budapest, Hungary. The final papers - revised and extended as a result of the discussions at the Workshop - together with a comparative analysis of the results will appear in the CLEF 2007 Proceedings, to be published by Springer in their Lecture Notes for Computer Science series. As from CLEF 2005, the Working Notes are published in electronic format only and are distributed to participants at the Workshop on CD-ROM together with the Book of Abstracts in printed form. All reports included in the Working Notes will also be inserted in the DELOS Digital Library, accessible at http://delos-dl.isti.cnr.it. Both Working Notes and Book of Abstracts are divided into eight sections, corresponding to the CLEF 2007 evaluation tracks, plus an additional section describing other evaluation initiatives using CLEF data: MorphoChallenge 2007 and SemEval 2007. In addition appendices are included containing run statistics for the Ad Hoc, Domain-Specific, GeoCLEF and CL-SR tracks, plus a list of all participating groups showing in which track they took part. The main features of the 2007 campaign are briefly outlined here below in order to provide the necessary background to the experiments reported in the rest of the Working Notes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Cross-Language Text Retrieval (Ad Hoc): This year, this track offered mono- and bilingual tasks on target
collections for central European languages (Bulgarian, Czech3 and Hungarian). Similarly to last year, a bilingual
task encouraging system testing with non-European languages against English documents was offered. Topics
were made available in Amharic, Chinese, Oromo and Indonesian. A special sub-task regarded Indian language
search against an English target collection was also organised with the assistance of a number of Indian research
institutes, responsible for the preparation of the topics. The languages offered were Hindi, Bengali, Tamil, Telugu
and Marathi. In order to establish benchmarks in this subtask, all participating groups has to submit:
- one monolingual English to English run (mandatory)
- at least one run in Hindi to English (mandatory)
- runs in other Indian languages to English (optional).</p>
      <p>A "robust" task was again be offered, emphasizing the importance of reaching a minimal performance for all topics
instead of high average performance. Robustness is a key issue for the transfer of CLEF research into applications.
The 2007 robust task involved three languages often used in previous CLEF campaigns (English, French,
Portuguese). The track was coordinated jointly by ISTI-CNR and U.Padua (Italy) and U.Hildesheim (Germany).
Cross-Language Scientific Data Retrieval (Domain-Specific): Mono- and cross-language domain-specific
retrieval was studied in the domain of social sciences using structured data (e.g. bibliographic data, keywords, and
abstracts) from scientific reference databases. The target collections provided were: GIRT-4 for German/English,
INION for Russian and Cambridge Sociological Abstracts for English. A multi-lingual controlled vocabulary
(German, English, Russian) suitable for use with GIRT-4 and INION together with a bi-directional mapping
between this vocabulary and that used for indexing the Sociological Abstracts (English) was provided. Topics
were offered in English, German and Russian. This track was coordinated by IZ Bonn (Germany).
Multilingual Question Answering (QA@CLEF): QA@CLEF 2007 proposed both main and pilot tasks. The
main task scenario was topic-related QA, where the questions are grouped by topics and may contain anaphoric
references one to the others. The answers were retrieved from heterogeneous document collections, i.e. news
articles and Wikipedia. Many sub-tasks were set up, monolingual – where the questions and the target collections
searched for answers are in the same language - and bilingual – where source and target languages are different.
Bulgarian, Dutch, English, French, German, Italian, Portuguese, Romanian and Spanish were offered as target
languages; query languages used in the bilingual tasks depended on demand (see the track overview for details).
Following the positive response at QA@CLEF 2006, the Answer Validation Exercise (AVE) was reproposed. A
new pilot tasks was also offered: Question Answering on Speech Transcript (QAst), in which the answers to
factual questions have to be extracted from spontaneous speech transcriptions (manual and automatic
transcriptions) coming from different human interaction scenario. The track is organized by several institutions
(one for each source language) and jointly coordinated by CELCT, Trento (Italy), LSI-UNED, Madrid and UPC,
Barcelona (Spain).</p>
      <p>Cross-Language Retrieval in Image Collections (ImageCLEF): This track evaluated retrieval of images
described by text captions in several languages; both text and image retrieval techniques were exploitable. Four
challenging tasks were offered: (i) multilingual ad-hoc retrieval (collection with mixed English/German/Spanish
annotations, queries in more languages), (ii) medical image retrieval (casenotes in English/ French/German;
visual, mixed, semantic queries in same languages), (iii) hierarchical automatic image annotation for medical
images (fully categorized in English and German, purely visual task), (iv) photographic annotation through
detection of objects in images (using the same collection as (i) with a restricted number of objects, a purely visual
task). Image retrieval was not required for all tasks and a default visual and textual retrieval system was made
available for participants. The track coordinators were U.Sheffield (UK) and the U. and U. Hospitals of Geneva
(Switzerland). Oregon Health and Science U. (US), Victoria U., Melbourne (Australia), RWTH Achen (Germany)
and Vienna Univ. Tech (Austria) collaborated in the task organization.</p>
      <p>Cross-Language Speech Retrieval (CL-SR): The focus is on searching spontaneous speech from oral history
interviews rather than news broadcasts. The test collection created for the track is a subset of a large archive of
videotaped oral histories from survivors, liberators, rescuers and witnesses of the Holocaust created by the
Survivors of the Shoah Visual History Foundation (VHF). Automatic Speech Recognition (ASR) transcripts and
both automatically assigned and manually assigned thesaurus terms were available as part of the collection.
In 2006 the CL-SR track included search collections of conversational English and Czech speech using six
languages (Czech, Dutch, English, French, German and Spanish). In CLEF 2007 additional topics were added for
the Czech speech collection. Speech content is described by automatic speech transcriptions manually and
automatically assigned controlled vocabulary descriptors for concepts, dates and locations, manually assigned</p>
    </sec>
    <sec id="sec-2">
      <title>3 New this year.</title>
      <p>person names, and hand-written segment summaries. The track was coordinated by U. Maryland (USA), Dublin
City U. (Ireland) and Charles U. (Czech Republic).</p>
      <p>Multilingual Web Retrieval (WebCLEF): The WebCLEF 2007 task combines insights gained from previous
editions of WebCLEF 2005–2006 and the WiQA 2006 pilot, and goes beyond the navigational queries considered
at WebCLEF 2005 and 2006. At WebCLEF 2007 so-called undirected informational search goals were considered
in a web setting: “I want to learn anything/everything about my topic.” The track was coordinated by U.
Amsterdam (The Netherlands).</p>
      <p>Cross-Language Geographical Retrieval (GeoCLEF): The purpose of GeoCLEF is to test and evacuate
cross-language geographic information retrieval (GIR): retrieval for topics with a geographic specification.
GeoCLEF 2007 consisted of two sub tasks. A search task ran for the third time and a query classification task was
organized for the first. For the GeoCLEF 2007 search task, twenty-five search topics were defined by the
organizing groups for searching English, German, Portuguese and Spanish document collections. Topics were
translated into English, German and Spanish. For the classification task, a query log from a search engine was
provided and the groups needed to identify the queries with a geographic scope and the geographic components
within the local queries. The track was coordinated jointly by UC Berkeley (USA), U.Sheffield (UK), U.
Hildesheim (Germany), Linguateca SINTEF (Norway), Microsoft Asia (China).</p>
      <p>Details on the technical infrastructure and the organisation of these tracks can be found in the track overview
reports in this volume, collocated at the beginning of the relevant sections.
2.</p>
      <sec id="sec-2-1">
        <title>Test Collections</title>
        <p>A number of different document collections were used in CLEF 2007 to build the test collections:
• CLEF multilingual comparable corpus of more than 3 million news documents in 13 languages; new data
was added this year for Czech, Bulkagarian and English (see Table 1); Parts of this collections were used
in the Ad-Hoc, QuestionAnswering, and GeoCLEF tracks.
• The GIRT-4 social science database in English and German (over 300,000 documents) and two Russian
databases: the Russian Social Science Corpus (approx. 95,000 documents) and the Russian ISISS
collection for sociology and economics (approx. 150,000 docs). The RSSC corpus was not used this year.</p>
        <p>Cambridge Sociological Abstracts in English. These collections were used in the domain-specific track.
• The ImageCLEF track used collections for both general photographic and medical image retrieval:
¾ IAPR TC-12 photo database of 25,000 photographs with captions in English, German and</p>
        <p>Spanish; PASCAL VOC 2006 training data (new this year);
¾ ImageCLEFmed radiological database consisting of 6 distinct datasets – 2 more than last year;
IRMA collection in English and German of 12,000 classified images for automatic medical
image annotation
• Malach collection of spontaneous conversational speech derived from the Shoah archives in English
(more than 750 hours) and Czech (approx 500 hours). This collection was used in the speech retrieval
track.
• EuroGOV, a multilingual collection of about 3.5M webpages, containing documents many languages
crawled from European governmental sites, used in the WebCLEF track.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Technical Infrastructure</title>
        <p>The CLEF technical infrastructure is managed by the DIRECT system. DIRECT manages the test data plus results
submission and analyses for the ad hoc, question answering and geographic IR tracks. It has been designed to
facilitate data management tasks but also to support the production, maintenance, enrichment and interpretation of
the scientific data for subsequent in-depth evaluation studies.</p>
        <p>The technical infrastructure is thus responsible for:
• the track set-up, harvesting of documents, management of the registration of participants to tracks;
• the submission of experiments, collection of metadata about experiments, and their validation;
• the creation of document pools and the management of relevance assessment;
• the provision of common statistical analysis tools for both organizers and participants in order to allow the
comparison of the experiments;
• the provision of common tools for summarizing, producing reports and graphs on the measured
performances and conducted analyses.</p>
        <p>DIRECT is designed and implemented by Giorgio Di Nunzio and Nicola Ferro
4 The number of tokens extracted from each document can vary slightly across systems, depending on the respective definition
of what constitutes a token. Consequently, the number of tokens and features given in this table are approximations and may
differ from actual implemented systems.</p>
        <p>40
35
s
p 30
u
o
rG25
g
in 20
t
a
ip 15
c
i
tra 10
P
5
0
100
90
80
70
60
50
40
30
20
10
0</p>
        <sec id="sec-2-2-1">
          <title>Oceania</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>South America</title>
        </sec>
        <sec id="sec-2-2-3">
          <title>North America</title>
        </sec>
        <sec id="sec-2-2-4">
          <title>Asia</title>
          <p>Europe
2000
2001
2002
2003
2004
2005
2006
2007
CLEF 2000-2007 Tracks</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>AdHoc</title>
        </sec>
        <sec id="sec-2-2-6">
          <title>DomSpec iCLEF CL-SR</title>
        </sec>
        <sec id="sec-2-2-7">
          <title>QA@CLEF</title>
        </sec>
        <sec id="sec-2-2-8">
          <title>ImageCLEF</title>
        </sec>
        <sec id="sec-2-2-9">
          <title>WebClef</title>
          <p>GeoClef
2000
2001
2002
2003
2004
2005
2006
2007
Years</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Participation</title>
        <p>A total of 81 groups submitted runs in CLEF 2006, slightly down from the 90 groups of CLEF 2006: 51(59.5) from
Europe, 14(14.5) from N.America; 14(10) from Asia, 1(4) from S.America and 1(1) from Australia. The
breakdown of participation of groups per track is as follows: Ad Hoc 22(25); Domain-Specific 5(4); QAatCLEF
28(37); ImageCLEF 35(25); CL-SR 8(6); WebCLEF 4(8); GeoCLEF 13(17)5. A list of groups and indications of
the tracks in which they participated is given in the Appendix to these Working Notes. Figure 1 shows the variation
in participation over the years and Figure 2 shows the shift in focus as new tracks have been added</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5 Last year’s figures are between brackets.</title>
      <p>In particular, these figures show that while there is a constant increase in interest in the ImageCLEF track, there is
a consistent decrease in popularity of the question answering and web tracks. Although the fluctuation in QA does
not seem to be of great significance - this is a very difficult task - the apparent lack of interest in WebCLEF is
surprising. With the importance of Internet and web search engines, a larger participation in this task is to be
expected. The large numbers for ImageCLEF also give rise to some discussion. The defining feature of CLEF is its
multilinguality; ImageCLEF is perhaps the least multilingual of the CLEF tracks as much of the work is done in a
language-independent context. These questions will be the subject of debate at the workshop. At the same time, it
should be noted that these Working Notes also include reports from two separate evaluation initiatives which
actually used CLEF data for certain tasks – thus the impact of CLEF spreads far beyond the boundaries of the
CLEF evaluation campaigns.</p>
      <sec id="sec-3-1">
        <title>Workshop</title>
        <p>CLEF aims at creating a strong CLIR/MLIR research and development community. The Workshop plays an
important role by providing the opportunity for all the groups that have participated in the evaluation campaign to
get together comparing approaches and exchanging ideas. The work of the groups participating in this year’s
campaign will be presented in plenary paper and poster sessions. There will also be break-out sessions for more
in-depth discussion of the results of individual tracks and intentions for the future. The final sessions will include
discussions on ideas for new tracks in future campaigns. Overall, the Workshop should provide an ample
panorama of the current state-of-the-art and the latest research directions in the multilingual information retrieval
area. I very much hope that it will prove an interesting, worthwhile and enjoyable experience to all those who
participate.</p>
        <p>The final programme and the presentations at the Workshop will be posted on the CLEF website at
http://www.clef-campaign.org.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Acknowledgements</title>
        <p>It would be impossible to run the CLEF evaluation initiative and organize the annual workshops without
considerable assistance from many groups. CLEF is organized on a distributed basis, with different research
groups being responsible for the running of the various tracks. My gratitude goes to all those who have been
involved in the coordination of the 2007 campaigns. A list of the main institutions involved is given on the
following page. Here below, let me thank the people mainly responsible for the coordination of the different tracks:
• Giorgio Di Nunzio, Nicola Ferro and Thomas Mandl for the Ad Hoc Track
• Vivien Petras, Stefan Baerisch, Maximillian Stempfhuber for the Domain-Specific track
• Bernardo Magnini, Danilo Giampiccolo, Pamela Forner, Anselmo Peñas, Christelle Ayache, Corina
Forăscu, Valentin Jijkoun, Petya Osenova, Paulo Rocha, Bogdan Sacaleanu, Richard Sutcliffe for
QA@CLEF
• Allan Hanbury, Paul Clough, Henning Müller, Thomas Deselaers, Michael Grubinger, Jayashree</p>
        <p>Kalpathy–Cramer and William Hersh for ImageCLEF
• Douglas W. Oard, Gareth J. F. Jones, and Pavel Pecina for CL-SR
• Valentin Jijkoun and Maarten de Rijke for Web-CLEF
• Thomas Mandl, Fredric Gey, Giorgio Di Nunzio, Nicola Ferro, Ray Larson, Mark Sanderson, Diana</p>
        <p>Santos, Christa Womser-Hacker, Xing Xie for GeoCLEF
I also thank all those colleagues who have helped us by preparing topic sets in different languages and in particular
the NLP Lab. Dept. of Computer Science and Information Engineering of the National Taiwan University for their
work on Chinese.</p>
        <p>I should also like to thank the members of the CLEF Steering Committee who have assisted me with their advice
and suggestions throughout this campaign.</p>
        <p>Furthermore, I gratefully acknowledge the support of all the data providers and copyright holders, and in
particular:
 The Los Angeles Times, for the American-English data collection
 SMG Newspapers (The Herald) for the British-English data collection
 Le Monde S.A. and ELDA: Evaluations and Language resources Distribution Agency, for the French data
 Frankfurter Rundschau, Druck und Verlagshaus Frankfurt am Main; Der Spiegel, Spiegel Verlag,</p>
        <p>Hamburg, for the German newspaper collections
 InformationsZentrum Sozialwissen-schaften, Bonn, for the GIRT database
 SocioNet system for the Russian Social Science Corpora
 Hypersystems Srl, Torino and La Stampa, for the Italian data


















</p>
        <p>Agencia EFE S.A. for the Spanish data
NRC Handelsblad, Algemeen Dagblad and PCM Landelijke dagbladen/Het Parool for the Dutch
newspaper data
Aamulehti Oyj and Sanoma Osakeyhtiö for the Finnish newspaper data
Russika-Izvestia for the Russian newspaper data
Público, Portugal, and Linguateca for the Portuguese (PT) newspaper collection
Folha, Brazil, and Linguateca for the Portuguese (BR) newspaper collection
Tidningarnas Telegrambyrå (TT) SE-105 12 Stockholm, Sweden for the Swedish newspaper data
Schweizerische Depeschenagentur, Switzerland, for the French, German and Italian Swiss news agency
data
Ringier Kiadoi Rt. [Ringier Publishing Inc.].and the Research Institute for Linguistics, Hungarian Acad.
Sci. for the Hungarian newspaper documents
Sega AD, Sofia; Standart Nyuz AD, Novinar OD Sofia, and the BulTreeBank Project, Linguistic
Modelling Laboratory, IPP, Bulgarian Acad. Sci, for the Bulgarian newspaper documents
Mafra a.s. and Lidové Noviny a.s. for the Czech newspaper data
St Andrews University Library for the historic photographic archive
University and University Hospitals, Geneva, Switzerland and Oregon Health and Science University for
the ImageCLEFmed Radiological Medical Database
The Radiology Dept. of the University Hospitals of Geneva for the Casimage database and the PEIR
(Pathology Education Image Resource) for the images and the HEAL (Health Education Assets Library)
for the Annotation of the Peir dataset.</p>
        <p>Aachen University of Technology (RWTH), Germany for the IRMA database of annotated medical
images
Mallinkrodt Institue of Radiology for permission to use their nuclear medicine teaching file
University of Basel's Pathopic project for their Pathology teaching file
Michael Grubinger, administrator of the IAPR Image Benchmark, Clement Leung who initiated and
supervised the IAPR Image Benchmark Project, and André Kiwitz, the Managing Director of Viventura
for granting access to the image database and the raw image annotations of the tour guides.
The Survivors of the Shoah Visual History Foundation, and IBM for the Malach spoken document
collection
Without their contribution, this evaluation activity would be impossible.</p>
        <p>Last and not least, I should like to express my gratitude to Alessandro Nardi and Valeria Quochi for their assistance
in the organisation of the CLEF 2007 Workshop.
Maristella Agosti, University of Padova, Italy
Martin Braschler, Zurich University of Applied Sciences Winterthur, Switzerland
Amedeo Cappelli, ISTI-CNR &amp; CELCT, Italy
Hsin-Hsi Chen, National Taiwan University, Taipei, Taiwan
Khalid Choukri, Evaluations and Language resources Distribution Agency, Paris, France
Paul Clough, University of Sheffield, UK
Thomas Deselaers, RWTH Aachen University, Germany
David A. Evans, Clairvoyance Corporation, USA
Marcello Federico, ITC-irst, Trento, Italy
Christian Fluhr, CEA-LIST, Fontenay-aux-Roses, France
Norbert Fuhr, University of Duisburg, Germany
Frederic C. Gey, U.C. Berkeley, USA
Julio Gonzalo, LSI-UNED, Madrid, Spain
Donna Harman, National Institute of Standards and Technology, USA
Gareth Jones, Dublin City University, Ireland
Franciska de Jong, University of Twente, Netherlands
Noriko Kando, National Institute of Informatics, Tokyo, Japan
Jussi Karlgren, Swedish Institute of Computer Science, Sweden
Michael Kluck, German Institute for International and Security Affairs, Berlin, Germany
Natalia Loukachevitch, Moscow State University, Russia
Bernardo Magnini, ITC-irst, Trento, Italy
Paul McNamee, Johns Hopkins University, USA
Henning Müller, University &amp; University Hospitals of Geneva, Switzerland
Douglas W. Oard, University of Maryland, USA
Maarten de Rijke, University of Amsterdam, Netherlands
Diana Santos, Linguateca, Sintef, Oslo, Norway
Jacques Savoy, University of Neuchatel, Switzerland
Peter Schäuble, Eurospider Information Technologies, Switzerland
Richard Sutcliffe, University of Limerick, Ireland
Max Stempfhuber, Informationszentrum Sozialwissenschaften Bonn, Germany
Hans Uszkoreit, German Research Center for Artificial Intelligence (DFKI), Germany
Felisa Verdejo, LSI-UNED, Madrid, Spain
José Luis Vicedo, University of Alicante, Spain
Ellen Voorhees, National Institute of Standards and Technology, USA
Christa Womser-Hacker, University of Hildesheim, Germany</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>