<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CUNI at MediaEval 2015 Search and Anchoring in Video Archives: Anchoring via Information Retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petra Galušcˇ áková</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pavel Pecina</string-name>
          <email>pecina@ufal.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics Prague</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In the paper we deal with automatic detection of anchoring segments in a collection of TV programmes. The anchoring segments are intended to be further used as a basis for subsequent hyperlinking to another related video segments. The anchoring segments are therefore supposed to be fetching for the users of the collection. Using the hyperlinks, the users can easily navigate through the collection and find more information about the topic of their interest. We present two approaches, one based on metadata, the second one based on frequencies of proper names and numbers contained in the segments. Both approaches proved to be helpful for different aspects of anchoring problem: the segments which contain a large number of proper names and numbers are interesting for the users, while the segments most similar to the video description are highly informative.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Thanks to a big rise of available multimedia data, the sizes
of speech and video archives are recently growing rapidly.
The amount of information stored in such archives is
enormous, but the navigation in these archives is often tedious.
Therefore, special attention should be paid to proposal and
tuning of the methods for effective navigation in large
multimedia archives. Effective navigation can not only help users
to find required information easily but also to find
additional information about the topic of their interest using an
exploratory search.</p>
      <p>
        The Search and Anchoring in Video Archives task
organized in the MediaEval Benchmark 20151 explores effective
navigation methods for large video archives. In this paper
we deal with the Anchoring sub-task of the task. The main
purpose of the sub-task is to automatically label
anchoring segments. The anchoring segment should be in some
way remarkable for the archive users so that the users may
be interested in finding additional information about them.
Hyperlinking [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] can be further applied on the selected
segments and more segments similar to the anchoring ones
can be retrieved. The user can thus easily browse the archive
using the links between anchoring segments and related data
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>SYSTEM DESCRIPTION</title>
      <p>
        We used subtitles available for each program in all our
experiments. All videos were first segmented into shorter
passages, which can possibly be the anchoring segments.
The passages were either created regularly each 10 seconds
(Fixed Segmentation)2 or using machine learning-based
methods which automatically detect probable segment ends (ML
Segmentation) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Afterwards, created segments were sorted
according their likelihood of being the segment of interest
(anchoring segment) for the archive users.
      </p>
      <p>We used two methods for sorting created segments. In the
first method, manually-created metadata available for each
program were converted into the queries. The metadata
consist of the program name and a short program description.
Information retrieval (IR) system was then used to sort the
created segments according to their similarity with
metadata describing the program. The most similar segments
2Each segment was 60 seconds long.
are expected to contain a relevant content and to describe
the video recording well.</p>
      <p>In the second method, the segments were sorted
according to the frequency of numbers (words containing at least
one digit) and proper names (words containing an upper
case character while the previous word does not end with a
dot) contained in each segment. Segments containing more
proper names and numbers are intended to contain some
kind of specific information.</p>
      <p>
        In all experiments, we used the Terrier IR Framework
version 3.5 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and its implementation of the Hiemstra
Language Model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for retrieval, with its parameter set to 0.35.
We used Porter stemmer and Terrier’s stopwords list.
      </p>
      <p>Information retrieval system was also used for sorting the
segments based on the proper name and number frequencies.
As we further plan to combine the score from the retrieval
based on metadata with the segment preference given by
proper name and number frequencies, we first run the
retrieval in the same way as in the case when only metadata
were used. The weights corresponding to a frequency of
numbers and names were precalculated. Finally,
precalculated weights were linearly combined with the weight
acquired as the output of information retrieval. However, in
our preliminary experiments, the combination weight was
set to highly prefer the ordering given by proper name and
number frequencies.</p>
    </sec>
    <sec id="sec-3">
      <title>RESULTS</title>
      <p>The officially evaluated experiments are displayed in
Table 1. The machine learning-based segmentation proved to
increase Precision comparing to the fixed-length
segmentation. Thanks to a large number of overlapping retrieved
segments, which is even larger in the case of the machine
learning-based segmentation, this Precision confirms the
quality of the selection of the top-retrieved results. The
fixedlength segmentation without any proper name and number
preferences achieve the overall highest MRR score. This
may confirm our assumption that highly ranked segments
retrieved using metadata are the most informative for the
recording.</p>
      <p>The overall highest Precision and Recall scores are achieved
when segments contain the largest number of proper names
and numbers. This confirms the assumption that when the
segment contains large number of proper names and
numbers, it will be probably interesting for the archive user.
Also, the overall largest number of segments marked by the
annotators as relevant was retrieved when we sorted created
segments according to how many proper names and
numbers do they contain.
4.</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In the paper, we show several preliminary experiments
with segmentation methods and selection process which can
be used for anchor selection. High MRR score achieved by
conversion of metadata into queries and their utilization for
retrieving the most informative part of the document proved
this approach to be convenient for the anchor selection.</p>
      <p>Achieved Precision and Recall numbers show that the
combination of this approach with the preference of the
segment based on the frequencies of proper names and numbers
may further improve the results. However, this combination
would need some detailed tuning. The detection of proper
names should be solved properly using the named entity
recognition. Utilization of other information such as
frequency of the content words, mentions of places and persons
would also need further exploration.
5.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research is supported by the Czech Science
Foundation, grant number P103/12/G084, Charles University
Grant Agency GA UK, grant number 920913, and by SVV
project number 260 224.
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Racca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          . SAVA at MediaEval 2015:
          <article-title>Search and Anchoring in Video Archives</article-title>
          .
          <source>In Proc. of MediaEval</source>
          , Wurzen, Germany,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Racca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>The Search and Hyperlinking Task at MediaEval 2014</article-title>
          .
          <source>In Proc. of MediaEval</source>
          , Barcelona, Spain,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kruliš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lokoč</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          . CUNI at
          <article-title>MediaEval 2014 Search and Hyperlinking Task: Visual and Prosodic Features in Hyperlinking</article-title>
          .
          <source>In Proc. of MediaEval</source>
          , Barcelona, Spain,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Galuščáková</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <article-title>Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visual Documents</article-title>
          .
          <source>In Proc. of ICMR</source>
          , pages
          <fpage>217</fpage>
          -
          <lpage>224</lpage>
          , Glasgow, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          .
          <article-title>Using language models for information retrieval</article-title>
          .
          <source>PhD thesis</source>
          , University of Twente, Enschede, Netherlands,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ounis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Amati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Plachouras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Macdonald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          .
          <article-title>Terrier: A High Performance and Scalable Information Retrieval Platform</article-title>
          .
          <source>In Proc. of SIGIR</source>
          , Seattle, Washington, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>