<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recording and Analyzing Benchmarking Results: The Aims of the MediaEval Working Notes Proceedings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martha Larson</string-name>
          <email>m.a.larson@tudelft.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gareth Jones</string-name>
          <email>gareth.jones@computing.dcu.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Ionescu</string-name>
          <email>bionescu@imag.pub.ro</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Soleymani</string-name>
          <email>mohammad.soleymani@unige.ch</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Gravier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS IRISA and Inria Rennes</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Delft University of Technology</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University Politehnica of Bucharest</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Geneva</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>We present an in-depth look at the structure and the strategy of the working notes proceedings that is published by the Benchmarking Initiative for Multimedia Evaluation (MediaEval) in conjunction with its yearly workshop. The proceedings records information on the tasks that were o ered by the benchmark and the approaches that were developed by participants to address them. The proceedings is called a `working notes' because its aim is to support work in progress. Speci cally, it presents analyses of participants' algorithms for discussion at the workshop. This year, in addition to sections devoted to each of the tasks, we are piloting a new section called MediaEval Letters, for papers that transcend individual tasks or years of the benchmark.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>MediaEval is a benchmarking initiative dedicated to
evaluating new algorithms for multimedia access and retrieval.
It organizes an annual campaign, which o ers challenges to
the multimedia research community. The challenges take
the form of tasks that invite the exploitation of multiple
modalities (i.e., speech and music, visual content, textual
metadata, context). MediaEval's focus on human and social
aspects of multimedia sets it apart from other benchmarks.</p>
      <p>MediaEval is a `bottom-up benchmark'|it is not run by
a central project or institution. Instead, tasks are proposed
and organized by autonomous groups of task organizers.
Proposed tasks are accepted into the benchmark on the
basis of a community-wide survey that determines `grassroots'
interest in participating in the task. If the survey reveals a
su cient level of `demand' for the task, the task is vetted
for viability and is o ered by the benchmark.</p>
      <p>The responsibility for vetting tasks lies in the hands of the
MediaEval Community Council, a group of volunteers. In
2015, the MediaEval Community Council comprised the
authors of this paper. The Council also works to ensure that
tasks o ered are truly `MediaEval tasks', in that they
involve human and social aspects, and encourage multimodal
solutions, and also complement other benchmarks.</p>
      <p>A guiding principle for MediaEval is `less is more'. Each
task must commit itself to a single, o cial evaluation metric
(or evaluation procedure), and each participating team may
submit no more than ve di erent algorithms (referred to
as `runs') that address a task. These limitations force the
task organizers to clearly formulate the goal of a task for a
given year, and force participants to commit in advance to
the methods that they nd most promising.</p>
      <p>This pressure keeps the community focused on the
motivation behind their research and development activities. Tasks
are related to user needs that arise within speci c use
scenarios for multimedia technology. The needs must be de ned
clearly enough such that task solutions can be meaningfully
compared using a single evaluation procedure. Task
participants do not generate solutions blindly, but rather pursue
only those solutions that they nd to be most promising.</p>
      <p>The yearly MediaEval benchmarking cycle concludes with
a workshop bringing together task participants to report
on and discuss the current tasks and plan for the future.
The workshop publishes a proceedings consisting of
working notes papers. This proceedings serves as a record of the
output of the benchmark in any given year, including
descriptions of the tasks and the solutions o ered, as well as
analysis of the performance and e ectiveness of these
solutions. The purpose of this paper is to describe the structure
and the strategy of the working notes proceedings, and to
introduce MediaEval Letters, a new section of the proceedings
that is being piloted in 2015.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>THE WORK OF WORKING NOTES</title>
      <p>
        MediaEval is a benchmark that follows in the tradition of
evaluation campaigns in the information retrieval
community. It was established in 2008, as a track of the rst
generation of the CLEF campaign (then called `Cross-Language
Evaluation Forum'), which ran 2000{2009 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This
campaign compiled a working notes that was made available to
participants at the yearly workshop|examples include [
        <xref ref-type="bibr" rid="ref3 ref4">3,
4</xref>
        ]. Later, a proceedings volume with revised selected
papers from the benchmarking year was published.
      </p>
      <p>
        MediaEval adopted the CLEF practice of a working notes.
Another example of a similar practice is the `notebook'
published each year by the TRECVid campaign [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The current
generation of the CLEF campaign (now called `Conference
and Labs of the Evaluation Forum') publishes a conference
proceedings. See [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] for a complete list of CLEF proceedings.
      </p>
      <p>The form of the MediaEval working notes proceedings
follows directly from its intended function. The function of the
working notes is to `freeze' the participants' approaches to
the tasks at the moment of the run submission deadline.
Working notes papers contain the information necessary to
support discussion of participants' approaches at the
workshop, as well as reproduction of their approaches after the
workshop. They should: describe the algorithms used in the
individual runs, report the scores achieved with respect to
the o cial evaluation metric, and analyze the strengths and
weaknesses of the approach. They may also discuss runs
beyond the ve submitted runs, or present evaluation with
respect to alternate evaluation metrics. On the basis of the
information in the working notes papers, it should be
possible to understand what the most promising approaches are
to a task, and how the task (and/or the evaluation
methodology) might evolve in future years.</p>
    </sec>
    <sec id="sec-3">
      <title>MEDIAEVAL WORKING NOTES</title>
      <p>The MediaEval working notes proceedings consists of
sections devoted to tasks. The section begins with the overview
paper, and is followed by the papers of the participants.
Papers are intentionally kept short (two pages), which has
several functions. First, it forces authors to think carefully
about the task, and convey only the most important
information or insights to the readers. Second, it allows the
people involved in the task to get a quick overview of what was
done in the task, because the papers are short and easy to
read. A team of editors is drawn from among the task
organizers, and coordinates the peer review and revision process
that ensures the quality of the proceedings.</p>
      <p>The overview paper explains the objective of the task,
and the task de nition, which provides a speci cation of
the challenge to be addressed. It describes the use scenario
that motivates the task, and discusses related work. If the
task has been o ered before, the overview paper explains the
relationship of the current task to previous editions in past
years. In many cases, the overview paper will also o er an
outlook for further development of the task in future years.</p>
      <p>Participant papers focus on the approaches developed by
the participating teams to address the tasks. They provide
a description of the approach chosen by a team, and explain
why this choice was made. The paper should explain
succinctly the novelty of the approach, and/or the main insight
on which the authors build. The participants' papers cite
the overview paper for the task. In this way, participants
avoid repeating the entire description and motivation of the
task in their own papers, and can focus instead on their own
algorithms.</p>
      <p>In the participant paper, the related work that is
relevant to the speci c approach to the task that was developed
by the team is covered. In 2015, we extended the length
of the papers to include a third page containing only
references. This was done in order to make sure that the short
length of the paper did not force the authors to
compromise on explaining how their approach is related to existing
approaches.</p>
      <p>The short length of MediaEval working notes papers is
part of the overall `less is more' strategy. Limited space
encourages authors to focus on the most essential details.
This also helps to promote their status as `work in progress'.
After the workshop, participating teams are encouraged to
bring their work to fully maturity, and submit it for
publication at a mainstream venue. In many cases, groups
consisting of organizers and also task participants form during
the workshop, and go on to author joint publications about
the task. In 2015, the MediaEval Working Notes
proceedings will be published with CEUR-WS.org for a fth year.
CEUR-WS.org allows the rights to the papers to remain with
the authors, further encouraging follow-up publications.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>GAINS GOING BEYOND ‘WINNING’</title>
      <p>At the workshop, the task organizers present a ranked list
of the runs that were submitted to the task, and a
winner is declared. However, MediaEval scrupulously avoids
emphasizing `winning' as the main goal of participation in
the benchmark. Over-focus on achieving the top score
discourages participants from taking risks. Without risks, the
algorithms that are proposed to address a task are in danger
of converging on a local optimum, as participants seek only
incremental improvements to the best approaches of past
years. As such, a mark of the value of the contribution of a
participating team to the benchmark is the quality of their
working notes paper. In order to highlight innovation and
productive risk taking, MediaEval chooses a certain number
of teams each year to receive an MDM (MediaEval
Distinctive Mention). MDMs are used by the task organizers to
point out submissions that they see as having particularly
high promise, but did not achieve a top ranking scores.</p>
      <p>The focus on working notes paper quality and not
winning is also important in order to highlight task organizers'
solutions to their own tasks. As an outward symbol of the
level playing eld that MediaEval meticulously maintains,
runs submitted by organizers are excluded from the o cial
ranking. The working notes paper is the opportunity for the
task organizers to allow their own approaches to stand out.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>MEDIAEVAL LETTERS</title>
      <p>The MediaEval community produces results and insights
that often go beyond a single working notes paper of a given
task in a given year. In 2015, we are piloting a section of
the MediaEval working notes called MediaEval Letters to
provide a venue for publication of such papers. Although
a MediaEval Letters paper may be on any topic, we are
particularly interested in promoting several topics.</p>
      <p>Reproducibility and replication Insights gained from
reimplementation of algorithms from past MediaEval
working notes.</p>
      <p>
        Best Practices Proposals for extending the e
ectiveness or usefulness of the benchmark, for example, Adam
Rae's 2012 talk on code sharing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Evaluation Methodology and Metrics New ways of
evaluating tasks. Such contributions are necessary to keep
pace with task innovation.</p>
      <p>New Tasks Proposals or proofs-of-concept for future
MediaEval tasks.
`MediaEval history' Studies devoted to tracking where
we have been: Bibliographic studies, or studies
devoted to measuring the impact of MediaEval.</p>
      <p>The papers in the MediaEval Letters section are reviewed
by the MediaEval Community Council, who may ask for
support from other community members. A Letters paper
is considered successful if it triggers productive discussion in
the community during the writing and review process as well
as after publication. Moving forward, we hope the Letters
section will contribute to making the MediaEval working
notes proceedings useful and informative.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Cross</given-names>
            <surname>Language Evaluation</surname>
          </string-name>
          <string-name>
            <surname>Forum</surname>
          </string-name>
          , (accessed Sept.
          <year>2015</year>
          ). http://www.clef-campaign.org.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>The</surname>
            <given-names>CLEF</given-names>
          </string-name>
          <string-name>
            <surname>Initiative. Proceedings</surname>
          </string-name>
          , (accessed Sept.
          <year>2015</year>
          ). http://www.clef-initiative.eu/publication/proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Cross</given-names>
            <surname>Language Evaluation Forum</surname>
          </string-name>
          .
          <source>Working Notes for the CLEF 2008 Workshop</source>
          ,
          <year>2008</year>
          (accessed Sept.
          <year>2015</year>
          ). http://www.clefcampaign.org/2008/working notes/CLEF2008WNContents.html.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Cross</given-names>
            <surname>Language Evaluation Forum</surname>
          </string-name>
          .
          <source>Working Notes for the CLEF 2009 Workshop</source>
          ,
          <year>2009</year>
          (accessed Sept.
          <year>2015</year>
          ). http://www.clefcampaign.org/2009/working notes/CLEF2009WNContents.html.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rae</surname>
          </string-name>
          .
          <source>MediaEval Code of Conduct</source>
          ,
          <year>2012</year>
          (accessed Sept.
          <year>2015</year>
          ). http://www.slideshare.net/adamrae/code-sharing.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>TRECVID. TREC</given-names>
            <surname>Video</surname>
          </string-name>
          <article-title>Retrieval Evaluation Notebook Papers and Slides, (accessed Sept</article-title>
          .
          <year>2015</year>
          ). http://wwwnlpir.nist.gov/projects/tvpubs/tv.pubs.org.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>