=Paper= {{Paper |id=None |storemode=property |title=HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_40.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/GuinaudeauSGS13 }} ==HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_40.pdf
               HITS and IRISA at MediaEval 2013: Search and
                            Hyperlinking Task

                        Camille Guinaudeau                                    Anca-Roxana Simon* , Guillaume
                               HITS                                             Gravier** , Pascale Sébillot***
                    Schloss-Wolfsbrunnenweg 35                                       IRISA & INRIA Rennes
                    D-69118 Heidelberg, Germany                                Univ. Rennes 1* , CNRS** , INSA***
                 firstname.lastname@h-its.org                                    35042 Rennes Cedex, France
                                                                                firstname.lastname@irisa.fr

ABSTRACT                                                                 2.    SYSTEM DESCRIPTION
This paper describes our approach and results in the hy-                   We mostly exploit the transcripts provided [3, 6], which
perlinking sub-task at MediaEval 2013. A two step method                 were lemmatized, keeping only nouns, non modal verbs and
is implemented where the first step consists in establishing             adjectives. Utterances are sentences for manual transcripts,
a shortlist of relevant videos. In the second step, a target             speech segments for LIMSI’s and shots for LIUM’s1 .
segment is selected from each video in the shortlist. We
focus on target selection comparing two distinct strategies.             2.1    Shortlist of semantically related videos
The first one exploits a bipartite graph relating utterances               To limit detailed search for hyperlink targets given an an-
and words to find the most relevant utterances from which                chor, we first establish a shortlist of the 50 most related
segments are derived. The second one uses explicit topic seg-            videos, considering each video as a whole. A vectorial repre-
mentation, whether hierarchical or not, to select the target             sentation of transcripts is used for both anchors and videos,
segments.                                                                adopting the BM25 weighting. When the context in which
                                                                         the anchor appears is considered, a linear combination of the
1.    INTRODUCTION                                                       BM25 weights obtained resp. from the anchor and from the
                                                                         context is used, with a strong emphasis on the anchor (0.8
   We present the joint participation of HITS and IRISA to
                                                                         vs. 0.2). Videos are ranked in decreasing order according to
the Search and Hyperlinking task at MediaEval 2013 [2],
                                                                         the cosine distance with the anchor (possibly with its con-
limiting ourselves to the hyperlinking sub-task where one is
                                                                         text), removing videos which contain the anchor (same file
required to find targets for hyperlinks whose source is a given
                                                                         or file corresponding to rebroadcasting of the same content).
anchor. Similar to last year, we adopt a two step approach.
                                                                         The shortlist contains the top 50 videos to which we want to
A shortlist of semantically related target videos is first es-
                                                                         relate the anchor and which are further processed to select
tablished by comparing the anchor, possibly with context,
                                                                         a precise and short enough hyperlink target.
to entire videos using standard information retrieval tech-
niques. In the second step, we search for the most relevant
target segment within each video in the shortlist, respecting            2.2    Selection of hyperlinks targets
the time constraints imposed.                                               For each item in the top 50 related videos, we need to
   In 2013, we focused on the last step, i.e., the selection             extract the target segment for the link that will be estab-
of the most relevant target segment inside each video in the             lished with the anchor. According to evaluation rules, target
shortlist of semantically related videos. We believe that pre-           should be an excerpt with a duration between 10 s and 2 min.
cise target selection is a crucial step for the hyperlinking             Two approaches were taken, based on the same underlying
task: wrong timestamps within semantically related videos                idea, i.e., finding the consecutive shots or utterances within
can make the result useless even though the video is per                 the given time constraints which are the most related to
se relevant. However, previous work on the hyperlinking                  the anchor. A first approach relies on the hyperlink-induced
sub-task [1] mostly focused on linking anchors with relevant             topic search (HITS) algorithm [5], a link analysis method
videos but did not pay much attention to precise target se-              used to weight each shot according to its relationships with
lection. We implemented two distinct approaches of which                 words from the anchor. A second approach implements topic
several variants are compared. The first approach relies on              segmentation to find out coherent segments which are com-
a link analysis algorithm which exploits links in a graph to             pared to the anchor.
propagate associations between words and utterances so as
to select a small number of utterances as the link target.               Target selection with link analysis.
The second one relies on explicit topic segmentation to find                For a given shortlist video, link analysis relies on a bipar-
out topically coherent targets closely related to the anchor,            tite graph where the first set of nodes represents utterances,
extending last year’s approach to hierarchical segmentation              the second one representing words. Edges reflect the pairing
and fine grain text alignment techniques.                                between words and utterances, i.e., an edge between utter-
                                                                         ance Si and word Wj indicates that Wj appears in Si .
                                                                         1
Copyright is held by the author/owner(s). MediaEval 2013 Workshop, Oc-     Utterance boundaries being absent from LIUM’s tran-
tober 18-19, 2013, Barcelona, Spain                                      scripts, alignment with shot boundaries was performed.
   Exploiting the bipartite graph structure, the HITS algo-                                LIMSI      LIUM       MANUAL
rithm aims at assigning a score to each node n in the graph,          HITSa                0.0328     0.0253        —
where the score indicates how well n is connected to the              HITSc                0.0305     0.0237        —
others. HITS iteratively propagates scores via edges, taking          Linear+BoW           0.0219     0.0281      0.0436
into account the importance of nodes connected to n. In               Linear+ngrams        0.0399     0.0467      0.0633
the framework of hyperlink target selection, the idea is to           Hierarchical+BoW     0.0193     0.0233      0.0362
give a high score to utterances that are connected to words
related to the anchor and its context. Scores in word nodes      Table 1: Results for all methods on the 2013 test set
are initialized with a value reflecting the word frequency in    content is closely related to that of the anchor, if not almost
the anchor, alone (HITSa ) or with context (HITSc ). Fre-        similar. This tends to indicate that evaluators on Amazon
quent words increase the score of utterances containing such     Mechanical Turk (AMT) prefer links to highly correlated
words, in turn improving the score of words that appear in       content as opposed to links targeting contents on the same
the vicinity (i.e., the same utterance) of anchor words.         subject but with a more remote relationship.
   After convergence of the HITS algorithm, a score is ob-          Hierarchical segmentation turned out to be deceiving. One
tained for each shot by adding the scores of all utterances      probable explanation is that targets are somewhat smaller
within the shot. Merging heuristics are finally used to yield    than for linear segmentation (half the length of segments ob-
segments from which the best scoring one is picked as the        tained using linear segmentation on average). Small target
link target. Adjacent shots with a score above a threshold       segments make comparison with the anchor less reliable and
are merged into a single segment if the result is less than      increase the probability of having poorly related content.
2 min long, adding scores. Short segments less than 10 s are        Using HITS as described in Sec. 2.2 appears as a good
merged with the highest scoring neighbor.                        strategy for target selection. HITS implicitely uses a bag
                                                                 of words representation and compares favorably with linear
Target selection with topic segmentation.                        topic segmentation when comparison with the anchor re-
   As an alternative to link analysis, linear and hierarchical   lies on a similar representation. Introducing n-grams in the
topic segmentation is used to partition each video in the        graph might be a good option to improve the HITS-based
shortlist into homogeneous segments. Each segment is com-        approach.
pared to the anchor, considered with its context in all topic       Finally, topic segmentation algorithms yield better results
segmentation experiments, to find the most significant one.      on the LIUM transcripts than on LIMSI transcripts. This
   Linear topic segmentation is achieved using [4], providing    is most likely due to the fact that utterances in LIUM tran-
a set of segments which exhibit high vocabulary coherence.       scripts correspond to visual shots. Hence, the resulting tar-
In the hierarchical approach, each segment resulting from        get is visually consistent, while this is not the case for LIMSI
linear segmentation is again segmented using a criteria which    transcripts when using topic segmentation which relies on
combines lexical cohesion and disruption [7] so as to avoid      utterances that are not related to visual content (LIMSI’s
over-segmentation. The idea of hierarchical segmentation         utterances are usually longer while reference utterances are
is to have smaller segments to relate to the anchor, thus        smaller). We believe that visual consistency is a crucial fac-
possibly more accurate targets.                                  tor for AMT evaluators.
   For each segment resulting either from linear or from hier-
archical topic segmentation, the similarity with the anchor      4.     REFERENCES
and its context is calculated. We investigate two distances.
The first one is a classical cosine similarity measure assum-    [1] M. Eskevich, G. J. F. Jones, R. Aly, and et al.
ing tf-idf weights, thus relying on a bag of words representa-       Multimedia information seeking through search and
tion. This strategy was applied to linear (Linear+BoW)               hyperlinking. In ACM Intl. Conf. on Multimedia
and to hierarchical segmentation (Hierarchical+BoW).                 Retrieval, 2013.
To achieve better comparison, we also experimented n-gram        [2] M. Eskevich, G. J. F. Jones, S. Chen, R. Aly, and
alignments, where similarity is computed between words, bi-          R. Ordelman. The Search and Hyperlinking task at
grams and trigrams separately. Similarities from different n-        MediaEval 2013. In Working notes of the MediaEval
gram orders are linearly combined with weights equal to 0.2,         2013 Workshop, 2013.
0.3 and 0.5 for order 1, 2 and 3 respectively. N-gram compar-    [3] J.-L. Gauvain, L. Lamel, and G. Adda. The LIMSI
ison was applied to linear segmentation (Linear+ngrams).             broadcast news transcription system. Speech
   The best scoring segment is used as target, applying the          Communication, 37(1-2):89–108, 2002.
following postprocessing rules to match time constraints.        [4] C. Guinaudeau, G. Gravier, and P. Sébillot. Enhancing
Segments longer than 2 min are resegmented using a sliding           lexical cohesion measure with confidence measures,
window of 2 min, taking the best scoring window within the           semantic relations and language model interpolation for
segment. Segments shorter than 10 s are combined with the            multimedia spoken content topic segmentation.
best scoring neighbor until the minimum length is reached.           Computer Speech and Language, 26(2):90–104, 2011.
                                                                 [5] J. M. Kleinberg. Authoritative sources in a hyperlinked
3.   RESULTS                                                         environment. Journal of the ACM, 46(5):604–632, 1999.
  A number of observations can be drawn from the official        [6] H. Schwenk and P. Lambert. LIUM’s SMT machine
evaluation results in Tab. 1.                                        translation systems for WMT 2011. In Workshop on
  Considering the anchor and its context, the best results           Statistical Machine Translation, 2011.
are clearly obtained with n-gram alignment along with linear     [7] A. Simon, G. Gravier, and P. Sébillot. Leveraging
topic segmentation. These good results are obviously to be           lexical cohesion and disruption for topic segmentation.
attributed to n-grams which yields target segments whose             In Empirical Methods in NLP, 2013.