<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Out-of-the-box strategy for Rich Speech Retrieval @ MediaEval 2011</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Wouter Alink</string-name>
          <email>wouter@spinque.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Cornacchia</string-name>
          <email>roberto@spinque.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Spinque</institution>
          ,
          <addr-line>Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2011</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>2</lpage>
      <abstract>
        <p>Evaluation tracks o er valuable opportunities to measure scienti c and technological advances. Spinque approaches challenges as the MediaEval Rich Speech Recognition task with the additional goal of developing solutions that can easily transferred from academic labs to industry. The system used during this evaluation was obtained with minimal e ort and no manual optimisation and yet it provides a reasonably good baseline to improve upon. More importantly, it is by nature an extensible approach, based on the concept of declarative search strategies, rather than an ad-hoc search system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Our participation in the MediaEval Rich Speech
Recognition task, described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], has been inspired by the quest
for nding a simple, fast, robust, and e ective approach to
searching in speech transcripts. We used our generic search
framework to instantiate a speci c search solution for this
task, with the explicit goal of producing reasonable results
in the space of a few hours, including index creation, search
strategy modelling and evaluation. As for example argued
in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], standard textual IR techniques can be applied to
speech transcripts, even when the transcripts are not
perfect. Our runs focus on textual search with di erent query
keyword combinations and with rank re nement at di erent
levels of retrieval unit granularity.
      </p>
    </sec>
    <sec id="sec-2">
      <title>SPINQUE FRAMEWORK</title>
      <p>We modelled and executed our runs as search strategies
within the Spinque framework. This is a prototype
environment where search processes are divided into two phases:
the search strategy de nition and the actual search.</p>
      <p>Modelling search strategies in this framework corresponds
to designing graph structures, where edges represent
dataows consisting of terms, documents (e.g. speech-transcripts),
and document-sections. The nodes connected by such edges
are pre-de ned, general-purpose operational blocks, that
either provide source data (the speech transcripts and the
topics) or modify their input data- ow applying operations such
as extraction of speci c sections from documents or ranking
of sections and documents, to name a few.</p>
      <p>Search strategies de ned in this framework are
automati3.</p>
    </sec>
    <sec id="sec-3">
      <title>DESCRIPTION</title>
      <p>The speech transcripts were indexed at two levels of
granularity: as whole documents as well as individual
SpeechSegment sections. We did not use the tags and the video
keyframes provided, nor any other source of evidence.</p>
      <p>Our runs can be described as follows:
run1 First, all words from title (weight 0:2) and all words
from short-title (weight 0:8) are used to search all
documents in the collection. Then, all the SpeechSegment
sections within those documents are searched using the
same keywords. The start of the section is returned as
the result. This strategy is depicted in Figure 1.
run2 the same as run1, except that all terms from title get a
weight of 0:0 and all terms from short-title get a weight
of 1:0. This basically discards the terms from title.
run3 the same as run1, except that all terms from title get
a weight of 1:0 and all terms from short-title get a
weight of 0:0. This basically discards the terms from
short-title. Run3 should be considered as the \required
run".</p>
      <p>
        Textual ranking is performed with the BM25 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] retrieval
method, with standard parameters b = 0:75 and k1 = 1:2.
The weights 0:2 (words from title) and 0:8 (words from
shorttitle) have been found as the local optimum using a hill
climbing approach.
4.
      </p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND FINDINGS</title>
      <p>The average time for retrieving results for a topic was
230ms. This time includes \compiling" the search
strategy (i.e. translating it into SQL queries) out-of-the-box and
without manual optimisations, and the overhead for
generating the run- les. A glitch later found in our indexer may
have altered results marginally: a few documents have not
been included in our index and therefore not retrieved.</p>
      <p>The evaluation scores for the 3 submitted runs are shown
in Table 1. Scores have been measured with window sizes of
10, 30, and 60 seconds. Overall scores are reasonably
satisfying for a simple keyword-search approach. As expected,
the combination of both the title and the short-title yield
a better result than the individual runs. Best results were
title
0.2
0.0
1.0</p>
      <p>Weights for
short-title
0.8
1.0
0.0</p>
      <p>Window size (seconds)
10 30
0.1320 0.2210
0.1164 0.1816
0.1054 0.1630
found on the test-set assigning a larger weight to short-title
keywords, which suggests that full titles may carry o -topic
words which yield lower precision.</p>
      <p>We found that searching short sections produced
disappointing rankings, probably due to a non ne-tuned
documentlength normalisation. Both parameter con gurations used
(for BM25 and for the title / short-title keyword mixture)
could be improved with a more exhaustive exploration of
their search space. The simplicity of the strategies used and
the small size of the corpus at hand would make this
approach feasible indeed, which is not the case in general.</p>
      <p>One more direction for possible improvements is to
experiment with a more ne-grained zooming in, with search
windows of e.g. entire documents followed by 10 minute, 1
minute and 5 seconds speeches. Such a multi-stage
strategy would likely retain recall and improve precision at every
iteration.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS</title>
      <p>The main contribution of this paper is to show how a
speci c search engine for speech transcripts of reasonable
quality can be instantiated with minimal e ort. While
outof-the-box text search is not unique to Spinque's framework,
the ability to play with retrieval units of di erent
granularities and combine query and/or data sources easily is not
common.</p>
      <p>We plan to improve on our rst speech retrieval
evaluation in two ways: rstly, by automating as much as possible
the optimisation of search strategies' free parameters,
including the choice of unit retrieval granularities; secondly,
by building on top of this optimised baseline with the
addition of more sources of evidence that may be available (such
as tags and video material).
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Alink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornacchia</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A.P.</surname>
          </string-name>
          de Vries.
          <article-title>Searching clef-ip by strategy</article-title>
          .
          <source>In CLEF 2009, Revised Selected Papers</source>
          , Part I. Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>James</given-names>
            <surname>Allan</surname>
          </string-name>
          .
          <article-title>Perspectives on information retrieval and speech</article-title>
          .
          <source>In Information Retrieval Techniques for Speech Applications</source>
          , volume
          <volume>2273</volume>
          of Lecture Notes in Computer Science, pages
          <volume>323</volume>
          {
          <fpage>326</fpage>
          . Springer Berlin / Heidelberg,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Ko er</article-title>
          , S. Schmiedeke, and
          <string-name>
            <given-names>G.J.F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Overview of MediaEval 2011 Rich Speech Retrieval Task and Genre Tagging Task</article-title>
          . In MediaEval 2011 Workshop, Pisa, Italy, September 1-2
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hancock-Beaulieu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Gatford</surname>
          </string-name>
          .
          <article-title>Okapi at TREC-3</article-title>
          . In Third Text REtrieval Conference (TREC
          <year>1994</year>
          ),
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>