=Paper=
{{Paper
|id=Vol-1436/Paper90
|storemode=property
|title=Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings
|pdfUrl=https://ceur-ws.org/Vol-1436/Paper90.pdf
|volume=Vol-1436
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LarsonJISG15
}}
==Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings==
<pdf width="1500px">https://ceur-ws.org/Vol-1436/Paper90.pdf</pdf>
<pre>
       Recording and Analyzing Benchmarking Results:
     The Aims of the MediaEval Working Notes Proceedings

 Martha Larson1 , Gareth Jones2 , Bogdan Ionescu3 , Mohammad Soleymani4 , Guillaume Gravier5
                          1
                           Delft University of Technology, Netherlands 2 Dublin City University, Ireland
                    3
                        University Politehnica of Bucharest, Romania 4 University of Geneva, Switzerland
                                             5
                                               CNRS IRISA and Inria Rennes, France
            m.a.larson@tudelft.nl, gareth.jones@computing.dcu.ie, bionescu@imag.pub.ro
                          mohammad.soleymani@unige.ch, guig@irisa.fr

ABSTRACT                                                           (or evaluation procedure), and each participating team may
We present an in-depth look at the structure and the strat-        submit no more than five different algorithms (referred to
egy of the working notes proceedings that is published by the      as ‘runs’) that address a task. These limitations force the
Benchmarking Initiative for Multimedia Evaluation (Media-          task organizers to clearly formulate the goal of a task for a
Eval) in conjunction with its yearly workshop. The pro-            given year, and force participants to commit in advance to
ceedings records information on the tasks that were offered        the methods that they find most promising.
by the benchmark and the approaches that were developed               This pressure keeps the community focused on the motiva-
by participants to address them. The proceedings is called         tion behind their research and development activities. Tasks
a ‘working notes’ because its aim is to support work in            are related to user needs that arise within specific use scenar-
progress. Specifically, it presents analyses of participants’      ios for multimedia technology. The needs must be defined
algorithms for discussion at the workshop. This year, in           clearly enough such that task solutions can be meaningfully
addition to sections devoted to each of the tasks, we are          compared using a single evaluation procedure. Task partic-
piloting a new section called MediaEval Letters, for papers        ipants do not generate solutions blindly, but rather pursue
that transcend individual tasks or years of the benchmark.         only those solutions that they find to be most promising.
                                                                      The yearly MediaEval benchmarking cycle concludes with
                                                                   a workshop bringing together task participants to report
1.   INTRODUCTION                                                  on and discuss the current tasks and plan for the future.
                                                                   The workshop publishes a proceedings consisting of work-
   MediaEval is a benchmarking initiative dedicated to eval-
                                                                   ing notes papers. This proceedings serves as a record of the
uating new algorithms for multimedia access and retrieval.
                                                                   output of the benchmark in any given year, including de-
It organizes an annual campaign, which offers challenges to
                                                                   scriptions of the tasks and the solutions offered, as well as
the multimedia research community. The challenges take
                                                                   analysis of the performance and effectiveness of these solu-
the form of tasks that invite the exploitation of multiple
                                                                   tions. The purpose of this paper is to describe the structure
modalities (i.e., speech and music, visual content, textual
                                                                   and the strategy of the working notes proceedings, and to in-
metadata, context). MediaEval’s focus on human and social
                                                                   troduce MediaEval Letters, a new section of the proceedings
aspects of multimedia sets it apart from other benchmarks.
                                                                   that is being piloted in 2015.
   MediaEval is a ‘bottom-up benchmark’—it is not run by
a central project or institution. Instead, tasks are proposed
and organized by autonomous groups of task organizers.             2.   THE WORK OF WORKING NOTES
Proposed tasks are accepted into the benchmark on the ba-             MediaEval is a benchmark that follows in the tradition of
sis of a community-wide survey that determines ‘grassroots’        evaluation campaigns in the information retrieval commu-
interest in participating in the task. If the survey reveals a     nity. It was established in 2008, as a track of the first gener-
sufficient level of ‘demand’ for the task, the task is vetted      ation of the CLEF campaign (then called ‘Cross-Language
for viability and is offered by the benchmark.                     Evaluation Forum’), which ran 2000–2009 [1]. This cam-
   The responsibility for vetting tasks lies in the hands of the   paign compiled a working notes that was made available to
MediaEval Community Council, a group of volunteers. In             participants at the yearly workshop—examples include [3,
2015, the MediaEval Community Council comprised the au-            4]. Later, a proceedings volume with revised selected pa-
thors of this paper. The Council also works to ensure that         pers from the benchmarking year was published.
tasks offered are truly ‘MediaEval tasks’, in that they in-           MediaEval adopted the CLEF practice of a working notes.
volve human and social aspects, and encourage multimodal           Another example of a similar practice is the ‘notebook’ pub-
solutions, and also complement other benchmarks.                   lished each year by the TRECVid campaign [6]. The current
   A guiding principle for MediaEval is ‘less is more’. Each       generation of the CLEF campaign (now called ‘Conference
task must commit itself to a single, official evaluation metric    and Labs of the Evaluation Forum’) publishes a conference
                                                                   proceedings. See [2] for a complete list of CLEF proceedings.
                                                                      The form of the MediaEval working notes proceedings fol-
Copyright is held by the author/owner(s).                          lows directly from its intended function. The function of the
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany        working notes is to ‘freeze’ the participants’ approaches to
the tasks at the moment of the run submission deadline.          the task. In 2015, the MediaEval Working Notes proceed-
Working notes papers contain the information necessary to        ings will be published with CEUR-WS.org for a fifth year.
support discussion of participants’ approaches at the work-      CEUR-WS.org allows the rights to the papers to remain with
shop, as well as reproduction of their approaches after the      the authors, further encouraging follow-up publications.
workshop. They should: describe the algorithms used in the
individual runs, report the scores achieved with respect to      4.    GAINS GOING BEYOND ‘WINNING’
the official evaluation metric, and analyze the strengths and
                                                                    At the workshop, the task organizers present a ranked list
weaknesses of the approach. They may also discuss runs
                                                                 of the runs that were submitted to the task, and a win-
beyond the five submitted runs, or present evaluation with
                                                                 ner is declared. However, MediaEval scrupulously avoids
respect to alternate evaluation metrics. On the basis of the
                                                                 emphasizing ‘winning’ as the main goal of participation in
information in the working notes papers, it should be possi-
                                                                 the benchmark. Over-focus on achieving the top score dis-
ble to understand what the most promising approaches are
                                                                 courages participants from taking risks. Without risks, the
to a task, and how the task (and/or the evaluation method-
                                                                 algorithms that are proposed to address a task are in danger
ology) might evolve in future years.
                                                                 of converging on a local optimum, as participants seek only
                                                                 incremental improvements to the best approaches of past
3.   MEDIAEVAL WORKING NOTES                                     years. As such, a mark of the value of the contribution of a
   The MediaEval working notes proceedings consists of sec-      participating team to the benchmark is the quality of their
tions devoted to tasks. The section begins with the overview     working notes paper. In order to highlight innovation and
paper, and is followed by the papers of the participants.        productive risk taking, MediaEval chooses a certain number
Papers are intentionally kept short (two pages), which has       of teams each year to receive an MDM (MediaEval Distinc-
several functions. First, it forces authors to think carefully   tive Mention). MDMs are used by the task organizers to
about the task, and convey only the most important infor-        point out submissions that they see as having particularly
mation or insights to the readers. Second, it allows the peo-    high promise, but did not achieve a top ranking scores.
ple involved in the task to get a quick overview of what was        The focus on working notes paper quality and not win-
done in the task, because the papers are short and easy to       ning is also important in order to highlight task organizers’
read. A team of editors is drawn from among the task orga-       solutions to their own tasks. As an outward symbol of the
nizers, and coordinates the peer review and revision process     level playing field that MediaEval meticulously maintains,
that ensures the quality of the proceedings.                     runs submitted by organizers are excluded from the official
   The overview paper explains the objective of the task,        ranking. The working notes paper is the opportunity for the
and the task definition, which provides a specification of       task organizers to allow their own approaches to stand out.
the challenge to be addressed. It describes the use scenario
that motivates the task, and discusses related work. If the      5.    MEDIAEVAL LETTERS
task has been offered before, the overview paper explains the
relationship of the current task to previous editions in past      The MediaEval community produces results and insights
years. In many cases, the overview paper will also offer an      that often go beyond a single working notes paper of a given
outlook for further development of the task in future years.     task in a given year. In 2015, we are piloting a section of
   Participant papers focus on the approaches developed by       the MediaEval working notes called MediaEval Letters to
the participating teams to address the tasks. They provide       provide a venue for publication of such papers. Although
a description of the approach chosen by a team, and explain      a MediaEval Letters paper may be on any topic, we are
why this choice was made. The paper should explain suc-          particularly interested in promoting several topics.
cinctly the novelty of the approach, and/or the main insight          • Reproducibility and replication Insights gained from
on which the authors build. The participants’ papers cite               reimplementation of algorithms from past MediaEval
the overview paper for the task. In this way, participants              working notes.
avoid repeating the entire description and motivation of the          • Best Practices Proposals for extending the effective-
task in their own papers, and can focus instead on their own            ness or usefulness of the benchmark, for example, Adam
algorithms.                                                             Rae’s 2012 talk on code sharing [5].
   In the participant paper, the related work that is rele-           • Evaluation Methodology and Metrics New ways of eval-
vant to the specific approach to the task that was developed            uating tasks. Such contributions are necessary to keep
by the team is covered. In 2015, we extended the length                 pace with task innovation.
of the papers to include a third page containing only refer-          • New Tasks Proposals or proofs-of-concept for future
ences. This was done in order to make sure that the short               MediaEval tasks.
length of the paper did not force the authors to compro-              • ‘MediaEval history’ Studies devoted to tracking where
mise on explaining how their approach is related to existing            we have been: Bibliographic studies, or studies de-
approaches.                                                             voted to measuring the impact of MediaEval.
   The short length of MediaEval working notes papers is
part of the overall ‘less is more’ strategy. Limited space       The papers in the MediaEval Letters section are reviewed
encourages authors to focus on the most essential details.       by the MediaEval Community Council, who may ask for
This also helps to promote their status as ‘work in progress’.   support from other community members. A Letters paper
After the workshop, participating teams are encouraged to        is considered successful if it triggers productive discussion in
bring their work to fully maturity, and submit it for publi-     the community during the writing and review process as well
cation at a mainstream venue. In many cases, groups con-         as after publication. Moving forward, we hope the Letters
sisting of organizers and also task participants form during     section will contribute to making the MediaEval working
the workshop, and go on to author joint publications about       notes proceedings useful and informative.
6.   REFERENCES
[1] Cross Language Evaluation Forum, (accessed Sept.
    2015). http://www.clef-campaign.org.
[2] The CLEF Initiative. Proceedings, (accessed Sept.
    2015).
    http://www.clef-initiative.eu/publication/proceedings.
[3] Cross Language Evaluation Forum. Working Notes for
    the CLEF 2008 Workshop, 2008 (accessed Sept. 2015).
    http://www.clef-
    campaign.org/2008/working notes/CLEF2008WN-
    Contents.html.
[4] Cross Language Evaluation Forum. Working Notes for
    the CLEF 2009 Workshop, 2009 (accessed Sept. 2015).
    http://www.clef-
    campaign.org/2009/working notes/CLEF2009WN-
    Contents.html.
[5] A. Rae. MediaEval Code of Conduct, 2012 (accessed
    Sept. 2015).
    http://www.slideshare.net/adamrae/code-sharing.
[6] TRECVID. TREC Video Retrieval Evaluation Notebook
    Papers and Slides, (accessed Sept. 2015). http://www-
    nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.

</pre>