Introduction to the CLEF 2003 Working Notes Carol Peters Istituto di Scienza e Tecnologie dell’Informazione (ISTI-CNR), Pisa, Italy carol.peters@isti.cnr.it These Working Notes contain descriptions of the experiments conducted within CLEF 2003 – the fourth in a series of annual system evaluation campaigns organised by the Cross-Language Evaluation Forum1. The results of the experiments will be presented and discussed in the CLEF 2003 Workshop, 21-22 August, Trondheim, Norway. The final papers - revised and extended as a result of the discussions at the Workshop - together with a comparative analysis of the results will appear in the CLEF 2003 Proceedings, to be published by Springer in their Lecture Notes for Computer Science series. CLEF organises a series of evaluation tracks designed to test different aspects of mono- and cross-language information retrieval system development. The intention is to encourage systems to move from monolingual text retrieval to the implementation of a full multilingual multimedia search service. The main features of the 2003 campaign are briefly outlined here below in order to provide the necessary background to the experiments reported in this volume. 1. Tracks and Tasks in CLEF 2003 In CLEF we distinguish between the core tracks, which are those that are offered regularly each year (the monolingual, bilingual, multilingual and domain-specific tracks) and additional tracks, which tend to be organized on a more experimental basis and have the objective of identifying new requirements and appropriate methodologies for their testing in the multilingual information access context. The core tracks are coordinated by the members of the CLEF consortium, whereas the additional tracks are organized by interested associated groups on a voluntary basis, under the CLEF umbrella. Tracks can consist of a number of separate tasks. CLEF 2003 offered eight separate tracks evaluating the performance of systems for: • multilingual information retrieval • bilingual information retrieval • monolingual (non-English) information retrieval • domain-specific information retrieval • interactive cross-language information retrieval • mono- and cross-language question answering • cross-language retrieval in image collections • cross-language spoken document retrieval 1.1 Core Tracks Multilingual/Bilingual/Monolingual Information Retrieval: The main track in CLEF is the multilingual one. One of the principal objectives of CLEF is to encourage developers to build systems capable of using a single query in a preferred query language to retrieve relevant documents in all languages in a multilingual collection, listing the results in a single, ranked list. In CLEF 2003, for the first time two distinct multilingual tasks were offered: multilingual-4 and multilingual-8. The collection for multilingual-4 contained English, French, German and Spanish documents. Multilingual-8 involved searching a collection containing documents in eight languages: Dutch, English, Finnish, French, German, Italian, Spanish and Swedish. The objective of the 2003 bilingual track was to encourage the tuning of systems running on challenging language pairs that did not include English; however we also wanted to ensure comparability of results. For this reason, unlike CLEF 2002 where the bilingual track allowed participants to query any available target collection using queries in any permitted topic language, bilingual runs in CLEF 2003 were only accepted for one or more 1 From October 2001, CLEF is supported by the European Commission under the IST programme (IST-2000-31002). The consortium members are: ISTI-CNR, Pisa; IZ Sozialwissenschaften, Bonn; ELRA/ELDA, Paris; Eurospider, Zurich; UNED, Madrid; NIST, USA. For more information, see: http://www.clef-campaign.org. of the following source -> target languages pairs: Italian -> Spanish, German -> Italian, French -> Dutch, Finnish -> German. Exceptionally, newcomers only (i.e. groups that had not previously taken part in a CLEF cross- language track) could choose to search the English document collection using a European topic language. At the last moment, we acquired a Russian collection and thus also included a Russian task in the bilingual track, permitting any language to be used for the queries. CLEF also provides the opportunity for monolingual system testing and tuning, and for building test suites in European languages other than English. There were eight different target collections for monolingual system testing in CLEF 2003. For each of these tracks (mono-/bi-/multilingual), the participating systems constructed their queries (automatically or manually) from a common set of topics, created to simulate user information needs. Each topic consists of three parts: a brief title statement; a one-sentence description; a more complex narrative specifying the relevance assessment criteria. Topic sets were produced by native speakers in the nine document languages and additionally for Chinese. A Japanese topic set will be made available in the future for use by participants in further experiments. As in previous years, a condition was that, for each task attempted, a mandatory run using the title and description fields had to be submitted. The objective is to facilitate comparison between the results of different systems. Relevance assessment was also performed in all cases by native speakers. The number of documents in large test collections such as CLEF makes it impractical to judge every document for relevance. Instead approximate recall techniques are calculated using pooling techniques. The results submitted by the participating groups were used to form a pool of documents for each topic and for each language by collecting the highly ranked documents from all submissions. The results were then analysed and run statistics produced and distributed. These are given in Appendix B. Domain-Specific Information Retrieval: The rationale for this track is to study CLIR on other types of collections, serving different kinds of information needs. The information provided by domain-specific scientific documents is highly targeted and contains much terminology. The domain-specific track offered mono- and bilingual tasks on the GIRT4 collection of social science documents. 25 topics were prepared in three languages: English, German and Russian. The topics were created and relevance assessments were performed by domain experts. More information on the organisation of CLEF evaluation campaigns and the creation of the test collections or the core tracks can be found in Martin Braschler and Carol Peters: CLEF 2002 Methodology and Metrics, Advances in Cross-Language Information Retrieval: Results of the CLEF 2002 Evaluation Campaign. Lecture Notes in Computer Science, Vol. 2785, Springer 2003. 1.2 Additional Tracks Interactive CLIR (iCLEF): This was the third year in which an interactive track was offered at CLEF. Two kinds of experiments were performed: experiments in cross-language document selection where the aim was to compare the degree to which different translation strategies can help the retrieval of relevant documents, and full cross-language search experiments where the user attempts to find as many relevant documents as possible with the help of a complete interactive CLIR system. Multilingual Question Answering (QAatCLEF): Question answering was introduced as a new activity in CLEF 2003, and represents an important innovation as successful QA systems need to integrate both information retrieval and natural language processing methodologies. The aim of this track is both to stimulate monolingual work in the question answering area on languages other than English and to encourage the development of experimental systems for cross-language QA. The track thus offered both monolingual and bilingual tasks: monolingual systems were evaluated within the framework of three non-English European languages, Dutch, Italian and Spanish while, in the cross-language task, Dutch, French, German, Italian or Spanish queries could be used on an English document collection. 200 questions were offered for each task. Cross-Language Retrieval in Image Collections (ImageCLEF): This track was offered for the first time in CLEF 2003 as a pilot experiment. The task was to retrieve as many relevant images as possible on the basis of an information need expressed in a language different from that of the document collection. 50 queries were made available in five languages (Dutch, French, German, Italian and Spanish) to search a British-English image collection. Searches could make use of the image content, the text captions or both. Cross-Language Spoken Document Retrieval (CL-SDR): The introduction of this track follows a first experimentation in 2002 within the framework of the DELOS Network of Excellence for Digital Libraries (http://delos-noe.iei.pi.cnr.it/). The aim this year in CLEF has been to evaluate CLIR systems on noisy automatic transcriptions of spoken documents with known story boundaries. 100 topics (for both development and evaluation) were made available in English, French, German, Italian, Spanish and Dutch. Details on the technical infrastructure, the collections used, and the organisation of these tracks can be found in the track overview reports in this volume, collocated at the beginning of the relevant sections. 2. Document Collections The main CLEF multilingual corpus consists of sets of documents in different European languages but with common features (e.g., same genre and time period, comparable content). The collection was considerably expanded for CLEF 2003 and now contains well over 1.5 million documents in nine languages: Dutch, English, Finnish, French, German, Italian, Russian, Spanish and Swedish. Russian is an important new addition as it is the first collection in the CLEF corpus that does not use the Latin-1 encoding system; it has been formatted in XML and encoded in UTF-8. The CLEF corpus includes both newswires and national newspapers and most collections cover the period 1994-1995. Exceptions are the Finnish collection, which goes from November 1994 through 1995, and the Russian dataset which consists of newspaper documents for just 1995. The domain-specific collection used in CLEF 2003 consisted of a new much larger version of the GIRT (German Indexing and Retrieval Test) social science database already used in previous editions of CLEF. This collection of 151319 documents consists of parallel corpora in English and in German. Controlled vocabularies in German-English and German-Russian were also made available to the participants in this track. 3. Participation Participation in CLEF 2003 was slightly up with respect to the previous year with 43 groups submitting results for one or more of the different tracks (compared with 37 in 2002): 10 from N.America; 30 from Europe, and 3 from Asia. Of these, 9 were from industry and 34 from academia. Once again the number of system runs submitted for the core tracks increased dramatically (approximately 450 against the nearly 300 received in 2002). The breakdown of participation of groups per track is as follows: multilingual-8: 7; multilingual-4: 14; bilingual: 13; monolingual: 22; GIRT: 4; iCLEF: 5; QAatCLEF: 8; ImageCLEF: 4; CL-SDR: 4. As in previous years, participating groups consist of a nice mix of new-comers (15) and veteran groups (28) coming back for a second, third or even fourth time. Another important trend that is again noticeable this year is the progression of many of the returning groups to a more complex task this year, from monolingual to bilingual, from bilingual to multilingual. CLEF 2003 posed a number of challenges to the participants, in particular with respect to the multilingual and bilingual tracks. We have been pleased by the results. We view the fact that 7 groups were brave enough to try the multilingual-8 task and that a considerable number of participants were willing to test their bilingual retrieval performance on “unusual” language pairs as very encouraging. Our aim is to stimulate groups to develop flexible systems, capable of handling many different language combinations rather than a favourite two or three. Another very positive aspect of CLEF2003 has been the number of new tracks and tasks that have been offered as pilot experiments. The aim has been to try out new ideas and develop new evaluation methodologies, suited to the emerging requirements of both system developers and users with respect to today’s digital collections and to encourage work on many European languages rather than just those most widely used. CLEF is thus gradually pushing its participants towards the ultimate goal: the development of truly multilingual systems capable of processing collections in diverse media. 4. Working Notes and Workshop The Working Notes provide a first description of the different experiments run by the participating groups. The volume is divided into two main parts: Core Tracks and Additional Tracks, with further subdivisions into sections largely corresponding to the different tracks. The papers have thus been collocated in the section considered most appropriate, even though many papers describe more than one type of experiment. A final section contains reports on evaluation issues and other evaluation initiatives. There are two appendixes. Appendix A contains the run statistics for the question answering task. Appendix B gives a list of the participants and a summary of the characteristics of all runs for the core tracks together with overview graphs for the different tasks and individual statistics for each run. The aim of the Workshop is to give all the groups that have participated in the CLEF 2003 evaluation campaign the opportunity to get together in order to compare approaches and to exchange ideas. The work of the groups participating in this year’s campaign will be presented in paper and poster sessions. Additional talks will include a report of the activities of the NTCIR evaluation initiative for Asian languages, and a report on cross-language information retrieval work at Moscow State University. The final session will discuss perspectives for future evaluation activities within the CLEF framework. The Workshop should thus provide an ample overview of the current state-of-the-art and the latest research directions in the multilingual information retrieval area. The presentations at the Workshop will be posted on the CLEF website at http://www.clef-campaign.org. We very much hope that the Workshop will prove an interesting, worthwhile and enjoyable experience to all those who participate. Acknowledgements It would be impossible to run the CLEF evaluation campaigns and organise the annual workshops without considerable assistance from many groups, working mainly on a voluntarily basis. We have numerous people and organisations to thank for their help in the running of CLEF 2003. Here below I list some of them: The Workshop Steering Committee Martin Braschler, Eurospider Information Technology, Switzerland Khalid Choukri, Evaluations and Language resources Distribution Agency, Paris, France Marcello Federico, Centro per la Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura, Italy Julio Gonzalo Arroyo, Lenguajes y Sistemas Informáticos, Universidad Nacional de Educaciòn a Distancia, Madrid, Spain Donna Harman, National Institute of Standards and Technology, USA Gareth Jones, University of Exeter, UK Noriko Kando, National Institute of Informatics, Japan Michael Kluck, IZ Sozialwissenschaften, Bonn, Germany Bernardo Magnini, Centro per la Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura, Italy Douglas W. Oard, University of Maryland, USA Mark Sanderson, University of Sheffield, UK Peter Schäuble, Eurospider Information Technology, Switzerland Ellen Voorhees, National Institute of Standards and Technology, USA CLEF Consortium Management Francesca Borri, ISTI-CNR, Pisa, Italy Associated Members of the CLEF Consortium • Department of Information Studies, University of Tampere, Finland – responsible for work on the Finnish collection • Human Computer Interaction and Language Engineering Laboratory, SICS, Kista, Sweden - responsible for work on the Swedish collection • University of Twente, Centre for Telematics and Information Technology, The Netherlands - responsible for work on the Dutch collection • Universität Hildesheim, Institut für Angewandte Sprachwissenschaft - Informationswissenschaft, Germany – responsible for checking and revision of the multilingual topic set • College of Information Studies and Institute for Advanced Computer Studies, University of Maryland, College Park, MD, USA – co-organisers of iCLEF • Centro per la Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura, Italy, main coordinators of the multilingual question answering track and co-organisers of the cross-language spoken document retrieval track • University of Exeter, co-organisers of the cross-language spoken document retrieval track • University of Sheffield, coordinators of the cross-language image retrieval track In addition, I should like to thank colleagues from the Natural Language Processing Lab, Department of Computer Science and Information Engineering, National Taiwan University, for preparing topics in Chinese, and the Moscow State University, Russia, for their assistance in obtaining the Russian collection We also gratefully acknowledge the support of all the data providers and copyright holders, and in particular: • The Los Angeles Times, for the American English data collection; • Le Monde S.A. and ELDA: Evaluations and Language resources Distribution Agency, for the French data. • Frankfurter Rundschau, Druck und Verlagshaus Frankfurt am Main; Der Spiegel, Spiegel Verlag, Hamburg, for the German newspaper collections. • InformationsZentrum Sozialwissenschaften, Bonn, for the GIRT database. • Hypersystems Srl, Torino and La Stampa, for the Italian newspaper data. • Agencia EFE S.A. for the Spanish newswire data. • NRC Handelsblad, Algemeen Dagblad and PCM Landelijke dagbladen/Het Parool for the Dutch newspaper data. • Aamulehti Oyj for the Finnish newspaper documents • Tidningarnas Telegrambyrå for the Swedish newspapers • The Herald 1995, SMG Newspapers, for the British English newspaper data • Schweizerische Depeschenagentur, Switzerland, for the French, German and Italian Swiss news agency data. • Russika-Izvestia for the Russian collection • St Andrews University Library for the image collection • NIST for access to the TREC-8 and TREC-9 SDR transcripts. Without their help, this evaluation activity would be impossible. Last and not least, I should like to express our gratitude to both Francesca Borri in Pisa and the ECDL conference organisers in Trondheim for their assistance in the local organisation of the CLEF 2003 Workshop. Carol Peters August 2003