<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CUNI at MediaEval 2014 Search and Hyperlinking Task: Search Task Experiments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petra Galušcˇ áková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Pecina</string-name>
          <email>pecina@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Prague</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>In this paper, we describe our participation in the Search part of the Search and Hyperlinking Task in MediaEval Benchmark 2014. In our experiments, we compare two types of segmentation: fixed-length segmentation and segmentation employing Decision Trees on a set of various features. We also show usefulness of exploiting metadata and explore removal of overlapping retrieved segments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The main aim of the Search sub-task is to find video
segments relevant to a given textual query. This problem is
an important part of the Spoken Content Retrieval [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ]
research area, which has been emerging in recent years.
      </p>
      <p>
        All experiments presented in this paper were conducted
on the BBC Broadcast data. A total of 1335 hours of video
was available for training and 2686 hours for testing. We
exploited subtitles, automatic speech recognition (ASR)
transcripts by LIMSI [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], LIUM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and NST-Sheffield [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], all
available for the task. Detailed information about the task
and data can be found in the task description [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>SYSTEM DESCRIPTION</title>
      <p>
        Based on the results of our previous experiments [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we
employed the Terrier IR system1 and its implementation of
the Hiemstra language model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with stemming and
stopwords removal.
      </p>
      <p>
        Two strategies were used for segmentation of the
recordings: 1) we divided the video recordings into segments of
fixed length and 2) we used segmentation system which
employed Decision Trees (DT) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This system makes use of
several features including cue word n-grams (word n-grams
frequently occurring at the segment boundary, e.g. “if”,
“I’m”, “especially”, “the”) and cue tag n-grams (tag n-grams
frequently occurring at the segment boundary, e.g. “VBP
PRP VBG”), silence between words, division given in
transcripts, and the output of the TextTiling algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For
each word in the transcript, it decides whether the segment
ends after this word or not. The created segments may
overlap. The system was trained on the data from Similar
Segments in Social Speech Task in MediaEval 2013 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
3.
      </p>
      <p>Based on our previous experiments, we set the segment
length in the fixed-length segmentation to 60 seconds and
the shift between the overlapping segments to 10 seconds.
The segment length applied in the segmentation system was
tuned on the training data and set to 50 seconds and 120
seconds for the Search sub-task. We also experimented with
post-filtering of the retrieved segments – we either used all
the retrieved segments or we removed segments which
partially overlapped with another higher ranked segment.</p>
      <p>We also employed the metadata provided for the task.
For each recording we extracted the title, episode title,
description, short episode synopsis, service name and program
variant and appended the text to each segment from that
recording.
4.</p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
      <p>
        The results for the Search sub-task are given in Table 1.
We present scores of six evaluation measures: Mean Average
Precision (MAP), Precision at 5 (P5), Precision at 10 (P10),
Precision at 20 (P20), Binned Relevance (MAP-bin), and
Tolerance to Irrelevance (MAP-tol) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Unsurprisingly, the best results are achieved in
experiments using subtitles. Generally, most of the results
obtained with the LIMSI transcripts are higher than the
corresponding results with the LIUM and NST-Sheffield
transcripts. The only exception are the experiments employing
overlapping segments. The results with the NST-Sheffield
transcripts are higher than the corresponding results with
the LIUM transcripts.</p>
      <p>In most of the cases, the concatenation of the segment
with metadata improved the results, despite the drop in the
P5 score for all types of transcripts. Apart from several
values of P and MAP-bin for the LIUM transcript, the
fixedlength segmentation outperforms the Decision Trees-based
segmentation with 120-seconds-long segments. Though the
50-seconds-long segments created using Decision Trees
notably outperform the fixed-length segments measured by
MAP and precision-based measures, they are outperformed
by the fixed-length segmentation using the MAP-bin and
MAP-tol measures.</p>
      <p>All measures, except the MAP-tol measure, are notably
higher in the experiments in which we did not remove
partially overlapping segments from the list of the retrieved
segments. Due to the nature of these measures, it is not
possible to distinguish, whether a user had already seen the
retrieved segment or not. Therefore, all the relevant segments,
which frequently overlap each other, increase the score. The
MAP-tol measure is not influenced by this behavior as it
takes into account only the relevant content which had not
been already seen by a user. Therefore, the highest MAP-tol
scores are achieved for the fixed-length segmentation when
the overlapping retrieved segments are removed.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION</title>
      <p>In our experiments in the Search sub-task, we have
experimented with subtitles and three ASR transcripts. The
subtitles outperformed all used ASR transcripts. However,
the LIMSI transcripts also generally scored well and they
slightly outperformed the NST-Sheffield transcripts. The
LIUM transcripts achieved the lowest scores in most of the
cases. Moreover, we have confirmed usefulness of the
metadata and effectiveness of simple segmentation into
fixedlength segments.</p>
      <p>We have also pointed out the problems with partially
overlapping segments occurring in the results. Such segments
can greatly increase MAP scores, however they could not be
expected to be helpful for the users. Therefore, the MAP-tol
measure could be preferred in such cases.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research is supported by the Czech Science
Foundation, grant number P103/12/G084, Charles University
Grant Agency GA UK, grant number 920913, and by SVV
project number 260 104.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Adapting Binary Information Retrieval Evaluation Metrics for Segment-based Retrieval Tasks</article-title>
          . CoRR, abs/1312.
          <year>1913</year>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Racca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>The Search and Hyperlinking Task at MediaEval 2014</article-title>
          .
          <source>In Proc. of MediaEval</source>
          , Barcelona, Spain,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <article-title>Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual Documents</article-title>
          .
          <source>In Proc. of ICMR</source>
          , pages
          <fpage>217</fpage>
          -
          <lpage>224</lpage>
          , Glasgow, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>23</volume>
          (
          <issue>1</issue>
          ):
          <fpage>33</fpage>
          -
          <lpage>64</lpage>
          , Mar.
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>Using Language Models for Information Retrieval</article-title>
          .
          <source>PhD thesis</source>
          , University of Twente, Enschede, Netherlands,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamel</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Gauvain</surname>
          </string-name>
          .
          <article-title>Speech Processing for Audio Indexing</article-title>
          .
          <source>In Proc. of GoTAL</source>
          , pages
          <fpage>4</fpage>
          -
          <lpage>15</lpage>
          , Gothenburg, Sweden,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lanchantin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-J.</given-names>
            <surname>Bell</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J.-F. Gales</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hain</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Quinnell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Renals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Saz</surname>
            , M.-
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Seigel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Swietojanski</surname>
            , and
            <given-names>P.-C.</given-names>
          </string-name>
          <string-name>
            <surname>Woodland</surname>
          </string-name>
          .
          <article-title>Automatic Transcription of Multi-genre Media Archives</article-title>
          .
          <source>In Proceedings of SLAM Workshop</source>
          , pages
          <fpage>26</fpage>
          -
          <lpage>31</lpage>
          , Marseille, France,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Larson</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Spoken Content Retrieval: A Survey of Techniques and Technologies</article-title>
          , volume
          <volume>5</volume>
          of Found. Trends Inf. Retr. Now Publishers Inc., Hanover, MA, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rousseau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Deléglise</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Estève</surname>
          </string-name>
          .
          <article-title>Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks</article-title>
          .
          <source>In Proc. of LREC</source>
          , pages
          <fpage>3935</fpage>
          -
          <lpage>3939</lpage>
          , Reykjavik, Iceland,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rüger</surname>
          </string-name>
          .
          <source>Multimedia Information Retrieval. Synthesis Lectures on Information Concepts</source>
          ,
          <source>Retrieval and Services</source>
          . Morgan &amp; Claypool Publishers, San Rafael, CA, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N. G.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Werner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Novick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Shriberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Oertel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Morency</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kawahara</surname>
          </string-name>
          .
          <article-title>The Similar Segments in Social Speech Task</article-title>
          .
          <source>In Proc. of MediaEval</source>
          , Barcelona, Spain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>