Introduction to the CLEF 2011 Labs Vivien Petras1 and Paul Clough2 1 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin vivien.petras@ibi.hu-berlin.de 2 Information School, University of Sheffield, United Kingdom p.d.clough@sheffield.ac.uk 1 Introduction The CLEF Labs are a continuation of tracks from previous CLEF workshops. In 2010, CLEF went from being a workshop collocated with an existing conference to a conference in its own right. The CLEF Labs are an integral part of the conference and two different types of lab are offered: (1) benchmarking or “campaign-style” and (2) workshop-style. The benchmarking labs on the whole follow the traditional (“campaign-style”) cycle of activities in a large-scale information retrieval evaluation experiment set-up: Definition of the task by the lab organisers; Preparation and release of data collections and test data by lab organisers; Release of queries, topics or other task descriptions by lab organisers; Submission of experimental results by participants; Relevance assessments and/or results analysis by lab organisers; Release of relevance assessments and/or results analysis by lab organisers; Submission of working notes on experiments and results analysis by participants; Presentation and discussion at the CLEF conference. Workshop-style labs are offered as a way of exploring possible benchmarking activities and to provide a means to discuss information retrieval evaluation issues from different perspectives. Seven labs were offered in CLEF 2011: 6 benchmarking labs and 1 workshop. By August 2011, 96 research groups had submitted experimental results in a benchmarking activity and 107 participants registered to attend one of the lab sessions at CLEF. The labs are discussed in the following sections. 2 Lab Selection Process A call for lab proposals was distributed in October 2010 and a lab selection committee formed from leading IR researchers (see Acknowledgements). Lab 2 proposals were requested to include a detailed description of the topics and goals of the lab, the target audience, potential opportunities for future versions of the lab, as well as details about the planning and organisation of the lab (e.g. suggested tasks, the data collections used in the lab, the background of lab organisers and steering committee members, etc.). The proposals were submitted by 8th November 2010 and the selection committee had 2 weeks to decide on the labs that would feature at CLEF 2011. Selection committee members reviewed all proposals and submitted both an individual ranking for each proposal as well as their comments on the lab. All reviews were aggregated by the Lab Chairs and a general ranking of labs was produced. Reviewing criteria for the benchmarking labs included (from the call for proposals1): Soundness of methodology and feasibility of the task; Potential use and business cases for the lab, e.g. a description of the underlying problems, industrial stakeholders, the potential for market; Estimated number of potential participants / critical mass; Clear movement along a growth path and development of the field. Reviewing criteria for the workshop-style labs included (from call for proposals): Appropriateness to the overall information access agenda pursued by CLEF; Estimated number of potential participants / critical mass; Whether outcomes of the workshop would make significant contributions to the field. Additional factors, such as innovation, minimal overlap with other evaluation initiatives and events, the vision for potential continuation, and the interdisciplinary nature of the lab were also considered. Of the 9 submitted labs, 6 were accepted as benchmarking labs and 1 accepted as a workshop, resulting in an acceptance rate of 7/9 (=77%). 3 Descriptions of the Labs The following benchmarking labs ran in CLEF 2011: CLEF- IP: IR in the IP domain: CLEF-IP used a collection of ca. 2 million patent documents in English, German, and French, together with patent images. Four tasks were organised: the Prior Art Candidate Search task for finding potential prior art for a document, the Image-based Document Retrieval for finding patent documents or images for a patent document containing images, the Classification task for classifying documents according to the International Patent Classification scheme and the Image-based Classification task for categorising patent images into pre-defined image categories. 1 http://clef2011.org/resources/call_for_labs/CLEF2011-Call_for_Lab_proposals.pdf Introduction to the CLEF 2011 Labs 3 Cross-Language Image Retrieval (ImageCLEF): This was the 9th year of running this track in CLEF. Four tasks were offered this year: image retrieval from Wikipedia, medical image retrieval with a data collection from the scientific literature, visual plant species classification of leaf images and a photo annotation task based on user- generated annotations/tags from Flickr. LogCLEF: In its 3rd CLEF year, the aim of LogCLEF is to analyse transaction logs and classify queries in order to help with understanding user‟s search behavior in multilingual contexts. Three tasks were organised: a query language identification task, a query classification task against a set of predetermined categories and the analysis of logs with respect to the „success‟ of a query. Three different log files were made available: logs from The European Library; logs from the Sogou Chinese search engine; and logs from Deutsche Bildungsserver. MusiCLEF: This was a newcomer to CLEF and the first year of running the lab. The goal of this lab is to aid the development of novel methodologies for both content- based and contextual-based (e.g. tags, comments, reviews, etc.) access and retrieval of music. Two tasks were offered in 2011: content- and context-based music retrieval for music categorisation and music identification for the clustering of the same (or similar) works. The test collection consisted of 10,000 songs in MP3 format. QA4MRE - Question Answering for Machine Reading Evaluation: This was a new task proposed by organisers of the previous Question Answering (QA) track of CLEF. The aim of this lab is to evaluate machine reading ability through question answering and reading comprehension tests. The task involves answering multiple choice questions based on reading a single document and with some inference from previously-acquired background knowledge. Uncovering Plagiarism, Authorship, and Wikipedia Vandalism (PAN): This is the second year of running PAN at CLEF and three tasks were offered to participants: plagiarism detection, author identification and Wikipedia vandalism detection. The last task seeks to support users of Wikipedia in their search for „vandalised‟ pages. The following workshop-style lab was also held at CLEF 2011: CHiC 2011 – Cultural Heritage in CLEF: From Use Cases to Evaluation in Practice for Multilingual Information Access to Cultural Heritage: This workshop aimed at surveying use cases for information access to cultural heritage materials and review evaluation initiatives and approaches in order to identify future opportunities for novel evaluation experiments and measures. Workshop participants were asked to introduce their ideas for possible evaluation scenarios. 4 Acknowledgements We would like to thank the lab selection committee for their assistance with reviewing proposals and in the selection process: Martin Braschler, Zurich University of Applied Sciences, Switzerland; Gregory Grefenstette, Exalead, France; Jussi Karlgren, Swedish Institute of Computer Science, Sweden: Doug Oard, College of Information Studies, University of Maryland; Mark Sanderson, Royal Melbourne Institute of Technology University, Australia; Ryen White, Microsoft, USA; Last, but not least, we would like to thank everyone involved in the organisation, running and presentation of the labs, especially the lab organisers. Without their tireless and mostly voluntary efforts, CLEF Labs would not be possible.