<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anca-Roxana Simon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guillaume Gravier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascale Sébillot</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IRISA</string-name>
          <email>rstname.lastname@irisa.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>INRIA Rennes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Camille Guinaudeau HITS Schloss-Wolfsbrunnenweg 35 D-69118 Heidelberg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Univ.</institution>
          <addr-line>Rennes 1</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>This paper describes our approach and results in the hyperlinking sub-task at MediaEval 2013. A two step method is implemented where the rst step consists in establishing a shortlist of relevant videos. In the second step, a target segment is selected from each video in the shortlist. We focus on target selection comparing two distinct strategies. The rst one exploits a bipartite graph relating utterances and words to nd the most relevant utterances from which segments are derived. The second one uses explicit topic segmentation, whether hierarchical or not, to select the target segments.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        We present the joint participation of HITS and IRISA to
the Search and Hyperlinking task at MediaEval 2013 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
limiting ourselves to the hyperlinking sub-task where one is
required to nd targets for hyperlinks whose source is a given
anchor. Similar to last year, we adopt a two step approach.
A shortlist of semantically related target videos is rst
established by comparing the anchor, possibly with context,
to entire videos using standard information retrieval
techniques. In the second step, we search for the most relevant
target segment within each video in the shortlist, respecting
the time constraints imposed.
      </p>
      <p>
        In 2013, we focused on the last step, i.e., the selection
of the most relevant target segment inside each video in the
shortlist of semantically related videos. We believe that
precise target selection is a crucial step for the hyperlinking
task: wrong timestamps within semantically related videos
can make the result useless even though the video is per
se relevant. However, previous work on the hyperlinking
sub-task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] mostly focused on linking anchors with relevant
videos but did not pay much attention to precise target
selection. We implemented two distinct approaches of which
several variants are compared. The rst approach relies on
a link analysis algorithm which exploits links in a graph to
propagate associations between words and utterances so as
to select a small number of utterances as the link target.
The second one relies on explicit topic segmentation to nd
out topically coherent targets closely related to the anchor,
extending last year's approach to hierarchical segmentation
and ne grain text alignment techniques.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>SYSTEM DESCRIPTION</title>
      <p>
        We mostly exploit the transcripts provided [
        <xref ref-type="bibr" rid="ref3 ref6">3, 6</xref>
        ], which
were lemmatized, keeping only nouns, non modal verbs and
adjectives. Utterances are sentences for manual transcripts,
speech segments for LIMSI's and shots for LIUM's1.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Shortlist of semantically related videos</title>
      <p>To limit detailed search for hyperlink targets given an
anchor, we rst establish a shortlist of the 50 most related
videos, considering each video as a whole. A vectorial
representation of transcripts is used for both anchors and videos,
adopting the BM25 weighting. When the context in which
the anchor appears is considered, a linear combination of the
BM25 weights obtained resp. from the anchor and from the
context is used, with a strong emphasis on the anchor (0.8
vs. 0.2). Videos are ranked in decreasing order according to
the cosine distance with the anchor (possibly with its
context), removing videos which contain the anchor (same le
or le corresponding to rebroadcasting of the same content).
The shortlist contains the top 50 videos to which we want to
relate the anchor and which are further processed to select
a precise and short enough hyperlink target.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Selection of hyperlinks targets</title>
      <p>
        For each item in the top 50 related videos, we need to
extract the target segment for the link that will be
established with the anchor. According to evaluation rules, target
should be an excerpt with a duration between 10 s and 2 min.
Two approaches were taken, based on the same underlying
idea, i.e., nding the consecutive shots or utterances within
the given time constraints which are the most related to
the anchor. A rst approach relies on the hyperlink-induced
topic search (HITS) algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a link analysis method
used to weight each shot according to its relationships with
words from the anchor. A second approach implements topic
segmentation to nd out coherent segments which are
compared to the anchor.
      </p>
      <sec id="sec-4-1">
        <title>Target selection with link analysis.</title>
        <p>For a given shortlist video, link analysis relies on a
bipartite graph where the rst set of nodes represents utterances,
the second one representing words. Edges re ect the pairing
between words and utterances, i.e., an edge between
utterance Si and word Wj indicates that Wj appears in Si.
1Utterance boundaries being absent from LIUM's
transcripts, alignment with shot boundaries was performed.</p>
        <p>Exploiting the bipartite graph structure, the HITS
algorithm aims at assigning a score to each node n in the graph,
where the score indicates how well n is connected to the
others. HITS iteratively propagates scores via edges, taking
into account the importance of nodes connected to n. In
the framework of hyperlink target selection, the idea is to
give a high score to utterances that are connected to words
related to the anchor and its context. Scores in word nodes
are initialized with a value re ecting the word frequency in
the anchor, alone (HITSa) or with context (HITSc).
Frequent words increase the score of utterances containing such
words, in turn improving the score of words that appear in
the vicinity (i.e., the same utterance) of anchor words.</p>
        <p>After convergence of the HITS algorithm, a score is
obtained for each shot by adding the scores of all utterances
within the shot. Merging heuristics are nally used to yield
segments from which the best scoring one is picked as the
link target. Adjacent shots with a score above a threshold
are merged into a single segment if the result is less than
2 min long, adding scores. Short segments less than 10 s are
merged with the highest scoring neighbor.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Target selection with topic segmentation.</title>
        <p>As an alternative to link analysis, linear and hierarchical
topic segmentation is used to partition each video in the
shortlist into homogeneous segments. Each segment is
compared to the anchor, considered with its context in all topic
segmentation experiments, to nd the most signi cant one.</p>
        <p>
          Linear topic segmentation is achieved using [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], providing
a set of segments which exhibit high vocabulary coherence.
In the hierarchical approach, each segment resulting from
linear segmentation is again segmented using a criteria which
combines lexical cohesion and disruption [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] so as to avoid
over-segmentation. The idea of hierarchical segmentation
is to have smaller segments to relate to the anchor, thus
possibly more accurate targets.
        </p>
        <p>For each segment resulting either from linear or from
hierarchical topic segmentation, the similarity with the anchor
and its context is calculated. We investigate two distances.
The rst one is a classical cosine similarity measure
assuming tf-idf weights, thus relying on a bag of words
representation. This strategy was applied to linear (Linear+BoW)
and to hierarchical segmentation (Hierarchical+BoW).
To achieve better comparison, we also experimented n-gram
alignments, where similarity is computed between words,
bigrams and trigrams separately. Similarities from di erent
ngram orders are linearly combined with weights equal to 0.2,
0.3 and 0.5 for order 1, 2 and 3 respectively. N-gram
comparison was applied to linear segmentation (Linear+ngrams).</p>
        <p>The best scoring segment is used as target, applying the
following postprocessing rules to match time constraints.
Segments longer than 2 min are resegmented using a sliding
window of 2 min, taking the best scoring window within the
segment. Segments shorter than 10 s are combined with the
best scoring neighbor until the minimum length is reached.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>RESULTS</title>
      <p>A number of observations can be drawn from the o cial
evaluation results in Tab. 1.</p>
      <p>Considering the anchor and its context, the best results
are clearly obtained with n-gram alignment along with linear
topic segmentation. These good results are obviously to be
attributed to n-grams which yields target segments whose
HITSa
HITSc
Linear+BoW
Linear+ngrams
Hierarchical+BoW
content is closely related to that of the anchor, if not almost
similar. This tends to indicate that evaluators on Amazon
Mechanical Turk (AMT) prefer links to highly correlated
content as opposed to links targeting contents on the same
subject but with a more remote relationship.</p>
      <p>Hierarchical segmentation turned out to be deceiving. One
probable explanation is that targets are somewhat smaller
than for linear segmentation (half the length of segments
obtained using linear segmentation on average). Small target
segments make comparison with the anchor less reliable and
increase the probability of having poorly related content.</p>
      <p>Using HITS as described in Sec. 2.2 appears as a good
strategy for target selection. HITS implicitely uses a bag
of words representation and compares favorably with linear
topic segmentation when comparison with the anchor
relies on a similar representation. Introducing n-grams in the
graph might be a good option to improve the HITS-based
approach.</p>
      <p>Finally, topic segmentation algorithms yield better results
on the LIUM transcripts than on LIMSI transcripts. This
is most likely due to the fact that utterances in LIUM
transcripts correspond to visual shots. Hence, the resulting
target is visually consistent, while this is not the case for LIMSI
transcripts when using topic segmentation which relies on
utterances that are not related to visual content (LIMSI's
utterances are usually longer while reference utterances are
smaller). We believe that visual consistency is a crucial
factor for AMT evaluators.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          , and et al.
          <article-title>Multimedia information seeking through search and hyperlinking</article-title>
          .
          <source>In ACM Intl. Conf. on Multimedia Retrieval</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          .
          <article-title>The Search and Hyperlinking task at MediaEval 2013</article-title>
          .
          <source>In Working notes of the MediaEval 2013 Workshop</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Gauvain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamel</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Adda.</surname>
          </string-name>
          <article-title>The LIMSI broadcast news transcription system</article-title>
          .
          <source>Speech Communication</source>
          ,
          <volume>37</volume>
          (
          <issue>1-2</issue>
          ):
          <volume>89</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Guinaudeau</surname>
          </string-name>
          , G. Gravier, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sebillot</surname>
          </string-name>
          .
          <article-title>Enhancing lexical cohesion measure with con dence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation</article-title>
          .
          <source>Computer Speech and Language</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <volume>90</volume>
          {
          <fpage>104</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          .
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          .
          <source>Journal of the ACM</source>
          ,
          <volume>46</volume>
          (
          <issue>5</issue>
          ):
          <volume>604</volume>
          {
          <fpage>632</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lambert</surname>
          </string-name>
          .
          <article-title>LIUM's SMT machine translation systems for WMT 2011</article-title>
          . In Workshop on Statistical Machine Translation,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Simon</surname>
          </string-name>
          , G. Gravier, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Sebillot</surname>
          </string-name>
          .
          <article-title>Leveraging lexical cohesion and disruption for topic segmentation</article-title>
          .
          <source>In Empirical Methods in NLP</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>