=Paper=
{{Paper
|id=None
|storemode=property
|title=HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_40.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/GuinaudeauSGS13
}}
==HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task==
HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task Camille Guinaudeau Anca-Roxana Simon* , Guillaume HITS Gravier** , Pascale Sébillot*** Schloss-Wolfsbrunnenweg 35 IRISA & INRIA Rennes D-69118 Heidelberg, Germany Univ. Rennes 1* , CNRS** , INSA*** firstname.lastname@h-its.org 35042 Rennes Cedex, France firstname.lastname@irisa.fr ABSTRACT 2. SYSTEM DESCRIPTION This paper describes our approach and results in the hy- We mostly exploit the transcripts provided [3, 6], which perlinking sub-task at MediaEval 2013. A two step method were lemmatized, keeping only nouns, non modal verbs and is implemented where the first step consists in establishing adjectives. Utterances are sentences for manual transcripts, a shortlist of relevant videos. In the second step, a target speech segments for LIMSI’s and shots for LIUM’s1 . segment is selected from each video in the shortlist. We focus on target selection comparing two distinct strategies. 2.1 Shortlist of semantically related videos The first one exploits a bipartite graph relating utterances To limit detailed search for hyperlink targets given an an- and words to find the most relevant utterances from which chor, we first establish a shortlist of the 50 most related segments are derived. The second one uses explicit topic seg- videos, considering each video as a whole. A vectorial repre- mentation, whether hierarchical or not, to select the target sentation of transcripts is used for both anchors and videos, segments. adopting the BM25 weighting. When the context in which the anchor appears is considered, a linear combination of the 1. INTRODUCTION BM25 weights obtained resp. from the anchor and from the context is used, with a strong emphasis on the anchor (0.8 We present the joint participation of HITS and IRISA to vs. 0.2). Videos are ranked in decreasing order according to the Search and Hyperlinking task at MediaEval 2013 [2], the cosine distance with the anchor (possibly with its con- limiting ourselves to the hyperlinking sub-task where one is text), removing videos which contain the anchor (same file required to find targets for hyperlinks whose source is a given or file corresponding to rebroadcasting of the same content). anchor. Similar to last year, we adopt a two step approach. The shortlist contains the top 50 videos to which we want to A shortlist of semantically related target videos is first es- relate the anchor and which are further processed to select tablished by comparing the anchor, possibly with context, a precise and short enough hyperlink target. to entire videos using standard information retrieval tech- niques. In the second step, we search for the most relevant target segment within each video in the shortlist, respecting 2.2 Selection of hyperlinks targets the time constraints imposed. For each item in the top 50 related videos, we need to In 2013, we focused on the last step, i.e., the selection extract the target segment for the link that will be estab- of the most relevant target segment inside each video in the lished with the anchor. According to evaluation rules, target shortlist of semantically related videos. We believe that pre- should be an excerpt with a duration between 10 s and 2 min. cise target selection is a crucial step for the hyperlinking Two approaches were taken, based on the same underlying task: wrong timestamps within semantically related videos idea, i.e., finding the consecutive shots or utterances within can make the result useless even though the video is per the given time constraints which are the most related to se relevant. However, previous work on the hyperlinking the anchor. A first approach relies on the hyperlink-induced sub-task [1] mostly focused on linking anchors with relevant topic search (HITS) algorithm [5], a link analysis method videos but did not pay much attention to precise target se- used to weight each shot according to its relationships with lection. We implemented two distinct approaches of which words from the anchor. A second approach implements topic several variants are compared. The first approach relies on segmentation to find out coherent segments which are com- a link analysis algorithm which exploits links in a graph to pared to the anchor. propagate associations between words and utterances so as to select a small number of utterances as the link target. Target selection with link analysis. The second one relies on explicit topic segmentation to find For a given shortlist video, link analysis relies on a bipar- out topically coherent targets closely related to the anchor, tite graph where the first set of nodes represents utterances, extending last year’s approach to hierarchical segmentation the second one representing words. Edges reflect the pairing and fine grain text alignment techniques. between words and utterances, i.e., an edge between utter- ance Si and word Wj indicates that Wj appears in Si . 1 Copyright is held by the author/owner(s). MediaEval 2013 Workshop, Oc- Utterance boundaries being absent from LIUM’s tran- tober 18-19, 2013, Barcelona, Spain scripts, alignment with shot boundaries was performed. Exploiting the bipartite graph structure, the HITS algo- LIMSI LIUM MANUAL rithm aims at assigning a score to each node n in the graph, HITSa 0.0328 0.0253 — where the score indicates how well n is connected to the HITSc 0.0305 0.0237 — others. HITS iteratively propagates scores via edges, taking Linear+BoW 0.0219 0.0281 0.0436 into account the importance of nodes connected to n. In Linear+ngrams 0.0399 0.0467 0.0633 the framework of hyperlink target selection, the idea is to Hierarchical+BoW 0.0193 0.0233 0.0362 give a high score to utterances that are connected to words related to the anchor and its context. Scores in word nodes Table 1: Results for all methods on the 2013 test set are initialized with a value reflecting the word frequency in content is closely related to that of the anchor, if not almost the anchor, alone (HITSa ) or with context (HITSc ). Fre- similar. This tends to indicate that evaluators on Amazon quent words increase the score of utterances containing such Mechanical Turk (AMT) prefer links to highly correlated words, in turn improving the score of words that appear in content as opposed to links targeting contents on the same the vicinity (i.e., the same utterance) of anchor words. subject but with a more remote relationship. After convergence of the HITS algorithm, a score is ob- Hierarchical segmentation turned out to be deceiving. One tained for each shot by adding the scores of all utterances probable explanation is that targets are somewhat smaller within the shot. Merging heuristics are finally used to yield than for linear segmentation (half the length of segments ob- segments from which the best scoring one is picked as the tained using linear segmentation on average). Small target link target. Adjacent shots with a score above a threshold segments make comparison with the anchor less reliable and are merged into a single segment if the result is less than increase the probability of having poorly related content. 2 min long, adding scores. Short segments less than 10 s are Using HITS as described in Sec. 2.2 appears as a good merged with the highest scoring neighbor. strategy for target selection. HITS implicitely uses a bag of words representation and compares favorably with linear Target selection with topic segmentation. topic segmentation when comparison with the anchor re- As an alternative to link analysis, linear and hierarchical lies on a similar representation. Introducing n-grams in the topic segmentation is used to partition each video in the graph might be a good option to improve the HITS-based shortlist into homogeneous segments. Each segment is com- approach. pared to the anchor, considered with its context in all topic Finally, topic segmentation algorithms yield better results segmentation experiments, to find the most significant one. on the LIUM transcripts than on LIMSI transcripts. This Linear topic segmentation is achieved using [4], providing is most likely due to the fact that utterances in LIUM tran- a set of segments which exhibit high vocabulary coherence. scripts correspond to visual shots. Hence, the resulting tar- In the hierarchical approach, each segment resulting from get is visually consistent, while this is not the case for LIMSI linear segmentation is again segmented using a criteria which transcripts when using topic segmentation which relies on combines lexical cohesion and disruption [7] so as to avoid utterances that are not related to visual content (LIMSI’s over-segmentation. The idea of hierarchical segmentation utterances are usually longer while reference utterances are is to have smaller segments to relate to the anchor, thus smaller). We believe that visual consistency is a crucial fac- possibly more accurate targets. tor for AMT evaluators. For each segment resulting either from linear or from hier- archical topic segmentation, the similarity with the anchor 4. REFERENCES and its context is calculated. We investigate two distances. The first one is a classical cosine similarity measure assum- [1] M. Eskevich, G. J. F. Jones, R. Aly, and et al. ing tf-idf weights, thus relying on a bag of words representa- Multimedia information seeking through search and tion. This strategy was applied to linear (Linear+BoW) hyperlinking. In ACM Intl. Conf. on Multimedia and to hierarchical segmentation (Hierarchical+BoW). Retrieval, 2013. To achieve better comparison, we also experimented n-gram [2] M. Eskevich, G. J. F. Jones, S. Chen, R. Aly, and alignments, where similarity is computed between words, bi- R. Ordelman. The Search and Hyperlinking task at grams and trigrams separately. Similarities from different n- MediaEval 2013. In Working notes of the MediaEval gram orders are linearly combined with weights equal to 0.2, 2013 Workshop, 2013. 0.3 and 0.5 for order 1, 2 and 3 respectively. N-gram compar- [3] J.-L. Gauvain, L. Lamel, and G. Adda. The LIMSI ison was applied to linear segmentation (Linear+ngrams). broadcast news transcription system. Speech The best scoring segment is used as target, applying the Communication, 37(1-2):89–108, 2002. following postprocessing rules to match time constraints. [4] C. Guinaudeau, G. Gravier, and P. Sébillot. Enhancing Segments longer than 2 min are resegmented using a sliding lexical cohesion measure with confidence measures, window of 2 min, taking the best scoring window within the semantic relations and language model interpolation for segment. Segments shorter than 10 s are combined with the multimedia spoken content topic segmentation. best scoring neighbor until the minimum length is reached. Computer Speech and Language, 26(2):90–104, 2011. [5] J. M. Kleinberg. Authoritative sources in a hyperlinked 3. RESULTS environment. Journal of the ACM, 46(5):604–632, 1999. A number of observations can be drawn from the official [6] H. Schwenk and P. Lambert. LIUM’s SMT machine evaluation results in Tab. 1. translation systems for WMT 2011. In Workshop on Considering the anchor and its context, the best results Statistical Machine Translation, 2011. are clearly obtained with n-gram alignment along with linear [7] A. Simon, G. Gravier, and P. Sébillot. Leveraging topic segmentation. These good results are obviously to be lexical cohesion and disruption for topic segmentation. attributed to n-grams which yields target segments whose In Empirical Methods in NLP, 2013.