=Paper=
{{Paper
|id=Vol-1436/Paper90
|storemode=property
|title=Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper90.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LarsonJISG15
}}
==Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings==
Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings Martha Larson1 , Gareth Jones2 , Bogdan Ionescu3 , Mohammad Soleymani4 , Guillaume Gravier5 1 Delft University of Technology, Netherlands 2 Dublin City University, Ireland 3 University Politehnica of Bucharest, Romania 4 University of Geneva, Switzerland 5 CNRS IRISA and Inria Rennes, France m.a.larson@tudelft.nl, gareth.jones@computing.dcu.ie, bionescu@imag.pub.ro mohammad.soleymani@unige.ch, guig@irisa.fr ABSTRACT (or evaluation procedure), and each participating team may We present an in-depth look at the structure and the strat- submit no more than five different algorithms (referred to egy of the working notes proceedings that is published by the as ‘runs’) that address a task. These limitations force the Benchmarking Initiative for Multimedia Evaluation (Media- task organizers to clearly formulate the goal of a task for a Eval) in conjunction with its yearly workshop. The pro- given year, and force participants to commit in advance to ceedings records information on the tasks that were offered the methods that they find most promising. by the benchmark and the approaches that were developed This pressure keeps the community focused on the motiva- by participants to address them. The proceedings is called tion behind their research and development activities. Tasks a ‘working notes’ because its aim is to support work in are related to user needs that arise within specific use scenar- progress. Specifically, it presents analyses of participants’ ios for multimedia technology. The needs must be defined algorithms for discussion at the workshop. This year, in clearly enough such that task solutions can be meaningfully addition to sections devoted to each of the tasks, we are compared using a single evaluation procedure. Task partic- piloting a new section called MediaEval Letters, for papers ipants do not generate solutions blindly, but rather pursue that transcend individual tasks or years of the benchmark. only those solutions that they find to be most promising. The yearly MediaEval benchmarking cycle concludes with a workshop bringing together task participants to report 1. INTRODUCTION on and discuss the current tasks and plan for the future. The workshop publishes a proceedings consisting of work- MediaEval is a benchmarking initiative dedicated to eval- ing notes papers. This proceedings serves as a record of the uating new algorithms for multimedia access and retrieval. output of the benchmark in any given year, including de- It organizes an annual campaign, which offers challenges to scriptions of the tasks and the solutions offered, as well as the multimedia research community. The challenges take analysis of the performance and effectiveness of these solu- the form of tasks that invite the exploitation of multiple tions. The purpose of this paper is to describe the structure modalities (i.e., speech and music, visual content, textual and the strategy of the working notes proceedings, and to in- metadata, context). MediaEval’s focus on human and social troduce MediaEval Letters, a new section of the proceedings aspects of multimedia sets it apart from other benchmarks. that is being piloted in 2015. MediaEval is a ‘bottom-up benchmark’—it is not run by a central project or institution. Instead, tasks are proposed and organized by autonomous groups of task organizers. 2. THE WORK OF WORKING NOTES Proposed tasks are accepted into the benchmark on the ba- MediaEval is a benchmark that follows in the tradition of sis of a community-wide survey that determines ‘grassroots’ evaluation campaigns in the information retrieval commu- interest in participating in the task. If the survey reveals a nity. It was established in 2008, as a track of the first gener- sufficient level of ‘demand’ for the task, the task is vetted ation of the CLEF campaign (then called ‘Cross-Language for viability and is offered by the benchmark. Evaluation Forum’), which ran 2000–2009 [1]. This cam- The responsibility for vetting tasks lies in the hands of the paign compiled a working notes that was made available to MediaEval Community Council, a group of volunteers. In participants at the yearly workshop—examples include [3, 2015, the MediaEval Community Council comprised the au- 4]. Later, a proceedings volume with revised selected pa- thors of this paper. The Council also works to ensure that pers from the benchmarking year was published. tasks offered are truly ‘MediaEval tasks’, in that they in- MediaEval adopted the CLEF practice of a working notes. volve human and social aspects, and encourage multimodal Another example of a similar practice is the ‘notebook’ pub- solutions, and also complement other benchmarks. lished each year by the TRECVid campaign [6]. The current A guiding principle for MediaEval is ‘less is more’. Each generation of the CLEF campaign (now called ‘Conference task must commit itself to a single, official evaluation metric and Labs of the Evaluation Forum’) publishes a conference proceedings. See [2] for a complete list of CLEF proceedings. The form of the MediaEval working notes proceedings fol- Copyright is held by the author/owner(s). lows directly from its intended function. The function of the MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany working notes is to ‘freeze’ the participants’ approaches to the tasks at the moment of the run submission deadline. the task. In 2015, the MediaEval Working Notes proceed- Working notes papers contain the information necessary to ings will be published with CEUR-WS.org for a fifth year. support discussion of participants’ approaches at the work- CEUR-WS.org allows the rights to the papers to remain with shop, as well as reproduction of their approaches after the the authors, further encouraging follow-up publications. workshop. They should: describe the algorithms used in the individual runs, report the scores achieved with respect to 4. GAINS GOING BEYOND ‘WINNING’ the official evaluation metric, and analyze the strengths and At the workshop, the task organizers present a ranked list weaknesses of the approach. They may also discuss runs of the runs that were submitted to the task, and a win- beyond the five submitted runs, or present evaluation with ner is declared. However, MediaEval scrupulously avoids respect to alternate evaluation metrics. On the basis of the emphasizing ‘winning’ as the main goal of participation in information in the working notes papers, it should be possi- the benchmark. Over-focus on achieving the top score dis- ble to understand what the most promising approaches are courages participants from taking risks. Without risks, the to a task, and how the task (and/or the evaluation method- algorithms that are proposed to address a task are in danger ology) might evolve in future years. of converging on a local optimum, as participants seek only incremental improvements to the best approaches of past 3. MEDIAEVAL WORKING NOTES years. As such, a mark of the value of the contribution of a The MediaEval working notes proceedings consists of sec- participating team to the benchmark is the quality of their tions devoted to tasks. The section begins with the overview working notes paper. In order to highlight innovation and paper, and is followed by the papers of the participants. productive risk taking, MediaEval chooses a certain number Papers are intentionally kept short (two pages), which has of teams each year to receive an MDM (MediaEval Distinc- several functions. First, it forces authors to think carefully tive Mention). MDMs are used by the task organizers to about the task, and convey only the most important infor- point out submissions that they see as having particularly mation or insights to the readers. Second, it allows the peo- high promise, but did not achieve a top ranking scores. ple involved in the task to get a quick overview of what was The focus on working notes paper quality and not win- done in the task, because the papers are short and easy to ning is also important in order to highlight task organizers’ read. A team of editors is drawn from among the task orga- solutions to their own tasks. As an outward symbol of the nizers, and coordinates the peer review and revision process level playing field that MediaEval meticulously maintains, that ensures the quality of the proceedings. runs submitted by organizers are excluded from the official The overview paper explains the objective of the task, ranking. The working notes paper is the opportunity for the and the task definition, which provides a specification of task organizers to allow their own approaches to stand out. the challenge to be addressed. It describes the use scenario that motivates the task, and discusses related work. If the 5. MEDIAEVAL LETTERS task has been offered before, the overview paper explains the relationship of the current task to previous editions in past The MediaEval community produces results and insights years. In many cases, the overview paper will also offer an that often go beyond a single working notes paper of a given outlook for further development of the task in future years. task in a given year. In 2015, we are piloting a section of Participant papers focus on the approaches developed by the MediaEval working notes called MediaEval Letters to the participating teams to address the tasks. They provide provide a venue for publication of such papers. Although a description of the approach chosen by a team, and explain a MediaEval Letters paper may be on any topic, we are why this choice was made. The paper should explain suc- particularly interested in promoting several topics. cinctly the novelty of the approach, and/or the main insight • Reproducibility and replication Insights gained from on which the authors build. The participants’ papers cite reimplementation of algorithms from past MediaEval the overview paper for the task. In this way, participants working notes. avoid repeating the entire description and motivation of the • Best Practices Proposals for extending the effective- task in their own papers, and can focus instead on their own ness or usefulness of the benchmark, for example, Adam algorithms. Rae’s 2012 talk on code sharing [5]. In the participant paper, the related work that is rele- • Evaluation Methodology and Metrics New ways of eval- vant to the specific approach to the task that was developed uating tasks. Such contributions are necessary to keep by the team is covered. In 2015, we extended the length pace with task innovation. of the papers to include a third page containing only refer- • New Tasks Proposals or proofs-of-concept for future ences. This was done in order to make sure that the short MediaEval tasks. length of the paper did not force the authors to compro- • ‘MediaEval history’ Studies devoted to tracking where mise on explaining how their approach is related to existing we have been: Bibliographic studies, or studies de- approaches. voted to measuring the impact of MediaEval. The short length of MediaEval working notes papers is part of the overall ‘less is more’ strategy. Limited space The papers in the MediaEval Letters section are reviewed encourages authors to focus on the most essential details. by the MediaEval Community Council, who may ask for This also helps to promote their status as ‘work in progress’. support from other community members. A Letters paper After the workshop, participating teams are encouraged to is considered successful if it triggers productive discussion in bring their work to fully maturity, and submit it for publi- the community during the writing and review process as well cation at a mainstream venue. In many cases, groups con- as after publication. Moving forward, we hope the Letters sisting of organizers and also task participants form during section will contribute to making the MediaEval working the workshop, and go on to author joint publications about notes proceedings useful and informative. 6. REFERENCES [1] Cross Language Evaluation Forum, (accessed Sept. 2015). http://www.clef-campaign.org. [2] The CLEF Initiative. Proceedings, (accessed Sept. 2015). http://www.clef-initiative.eu/publication/proceedings. [3] Cross Language Evaluation Forum. Working Notes for the CLEF 2008 Workshop, 2008 (accessed Sept. 2015). http://www.clef- campaign.org/2008/working notes/CLEF2008WN- Contents.html. [4] Cross Language Evaluation Forum. Working Notes for the CLEF 2009 Workshop, 2009 (accessed Sept. 2015). http://www.clef- campaign.org/2009/working notes/CLEF2009WN- Contents.html. [5] A. Rae. MediaEval Code of Conduct, 2012 (accessed Sept. 2015). http://www.slideshare.net/adamrae/code-sharing. [6] TRECVID. TREC Video Retrieval Evaluation Notebook Papers and Slides, (accessed Sept. 2015). http://www- nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.