=Paper=
{{Paper
|id=None
|storemode=property
|title=Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_90.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/SchoutenAO13
}}
==Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013==
Searching and Hyperlinking using Word Importance
Segment Boundaries in MediaEval 2013
Kim Schouten Robin Aly Roeland Ordelman
Erasmus University Rotterdam University of Twente University of Twente
Rotterdam, the Netherlands Enschede, the Netherlands Enschede, the Netherlands
schouten@ese.eur.nl r.aly@ewi.utwente.nl r.aly@ewi.utwente.nl
ABSTRACT Our intuition is that term are not only important to the mo-
This paper reports a set of experiments performed in the ment they were uttered but also to a time window around
context of the Searching and Hyperlinking task of the Medi- the utterance. We model the posterior probability of a query
aEval Benchmark Initiative 2013. The Searching part chal- term q being important at time t given its utterance at time
lenges to return a ranked list of video segments that are t2 as a double sigmoid function:
relevant given some textual user query, while for the Hy- p(qt = 1|ot ) =
perlinking task the aim is to return a ranked list of video
1 1 (1)
segments that are relevant given some video segment. The − t−e−c
main focus is on finding a way to compute flexible segment 1 + exp(− t−b+c
s
) 1 + exp(− s
)
boundaries. This is performed by extending the term fre-
where
quency part of tf-idf to include the temporal dimension of t = video time in milliseconds
videos. Although the contribution is theoretically sound its qt = binary variable of q being important at time t
performance is relatively poor, which we attribute to the fo- oq = the utterance of term q (milliseconds)
cus on speech data and the hyperlinking process. We plan to b = the begin time of query term q being uttered
refine our method in the future overcome these limitations. e = the end time of query term q being uttered
c = a correction term to shift the sigmoid to, respec-
1. INTRODUCTION tively, the left and the right, such that b < ∀t < e
yields a probability close to 1
The content of videos can be long and current search ap-
s = the steepness of the sigmoid modeling the reach
proaches that return whole videos waste the time of users
of the importance around oq , this is a multiple of
searching for specific information. The Search and Hyper-
idf (q)
linking Tasks in the MediaEval Benchmark Initiative 2013 [3],
Given our probabilistic model of the importance of query
which is a refinement of its previous instance [1], models the
terms, we now describe how we integrate this model into
situation where users should be directly pointed to relevant
the tf-idf ranking scheme. Instead of term frequencies in a
segments within videos and be offered links to video seg-
document, we calculate the score of time t as the frequency
ments relevant to the already found video segment. State-
of utterances that are important to t times its idf factor.
of-the-art search methods use fixed video segments as a re-
However, because we are not certain about the actual impor-
turn unit. However, fixed video segments have the limitation
tance qt , we calculate the expected frequency of important
that evidence for a relevant passage can be divided between
terms:
two segments or that the segments are too long. Therefore, X X X X
in this paper we propose a method to determine segment score(t) = idfq E[qt |oq ] = idfq p(qt |ot )
boundaries that are suitable for individual queries and to q∈Q oq ∈Oq q oq ∈Oq
rank these segments accordingly. (2)
The paper is outlined as follows. Section 2 describes our
methods for search and hyperlinking video. Section 3 de- where Q are the query terms and Oq are all occurrence of
scribes the details of our experiment and show the evalua- term q in the video.
tion results of our submitted runs, and finally we conclude Based on this ranking scheme, we determine the extent
in Section 4. of a result segment s as the adjacent t’s that have a score
above a certain threshold (set at 0.01 in our experiments).
The score of a segment seg is set to be the maximum score(t)
2. METHOD for every time point t in seg (we break ties by selecting the
In the following we describe our methods used for the shorter segment):
searching and hyperlinking sub-tasks. To this end, we ex-
tend the well-known tf-idf ranking function to incorporate score(seg) = max score(t) (3)
t∈seg
the temporal dimension of videos when computing scores.
where t iterates over the time interval of the segment. Fig-
ure 1 shows an example of our ranking scheme using a term
Copyright is held by the author/owner(s). with large idf and a term with low idf with each one utter-
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain ance, as well as the score(t) function that combines both.
score(t) 3.3 Discussion
double sigmoid for rare word The described method shows poor performance. We iden-
double sigmoid for common word
Score
tify the following possible reasons: First, our ranking scheme
only uses speech information to find video segments, while
their relevance is sometimes determined by their visual con-
Time tent or in the metadata (we found several instances where
this is the case). Second, we used only one parameter set-
ting, which was the most intuitive to us. We believe other
Figure 1: An example plot of two query keyword settings can improve the performance. Finally, for the query
occurrences in a video. generation for the hyperlinking task, we only used the text
uttered during the anchor segments. As some of them are
For the hyperlinking task, we construct a textual query relatively short (with a minimum of 10sec), we believe con-
from the given anchor segment. To this end, we consider sidering their surrounding can improve their performance.
the words uttered during the anchor segment and select the
ten words with the largest Kullback-Leibner divergence com- 4. CONCLUSIONS
pared to the language model of the whole collection, loosely We have described a method to rank video segments based
following the method proposed in [4]. Using the query, we on the uncertain importance of query terms at particular
used the same ranking scheme as for the search task. time in a video given the term’s utterances in the transcript.
The time interval where a term is important was described
3. EXPERIMENTS by a probability distribution modeled as a double sigmoid
function. We proposed that the points where the probabil-
In this section, we describe the experimental setup and
ity that none of the terms is important are intuitive segment
submitted runs for the searching, and hyperlinking sub-tasks.
boundaries specific to the current query, which many meth-
For both search queries and constructed hyperlinking queries,
ods lack. To rank segments we expanded the standard tf-idf
we return a ranked list of video segments by decreasing score.
weighting scheme to the situation where the importance of
3.1 Setup query terms at a given was uncertain. The final score to
rank segments was the maximum expected tf-idf score.
For an overview of the used dataset please refer to [3]. The described method showed relatively poor performance.
Transcripts from LIUM are provided in blocks of 20 seconds, We plan to pursue the following paths to improve the re-
which we chose as the duration of each term. For the manual sults in the future. First, we plan to extend our scheme to
subtitles and LIMSI transcripts we used the time annotation incorporate visual content and metadata. Second, we will
for the individual word. investigate multiple parameter settings to tune our method.
Our ranking scheme requires two parameters, see Equa- Finally, for hyperlinking we only extracted words from the
1.0
tion 1. We set c = s ∗ log( 0.01 − 1), which ensures that the given segment, which we believe may not provide enough in-
sigmoids will yield a value of 0.99 between b and e. This formation. In the future we plan to include the surrounding
corresponds to the intuition that a term is almost certainly of the anchor segment and the initial query for which a user
important close to the time it was uttered. For the steepness arrived at the current segment.
parameter, we use s = 100, 000 ∗ idf (q), causing rare words
to have wider influence.
5. REFERENCES
3.2 Results [1] M. Eskevich, G. J. F. Jones, S. Chen, R. Aly,
The results for the searching sub-task are given below in R. Ordelman, and M. Larson. Search and Hyperlinking
terms of Mean Reciprocal Rank (MRR), mean Generalized Task at MediaEval 2012. In MediaEval 2012 Workshop,
Average Precision (mGAP), and Mean Average Segment Pisa, Italy, October 4-5 2012.
Precision (MASP) (cf. [2]). [2] M. Eskevich, W. Magdy, and G. J. F. Jones. New
metrics for meaningful evaluation of informally
Runs MRR mGAP MASP structured speech retrieval. In Proceedings of the 34th
LIUM 0.09 0.06 0.06 European conference on Advances in Information
LIMSI 0.08 0.04 0.04 Retrieval, ECIR’12, pages 170–181. Springer Berlin
BBC Subtitles 0.05 0.04 0.03 Heidelberg, 2012.
[3] Maria Eskevich, Gareth J.F. Jones, Shu Chen, Robin
For the hyperlinking sub-task, the results are reported in Aly, and Roeland Ordelman. The Search and
terms of Mean Average Precision (MAP). Hyperlinking Task at MediaEval 2013. In MediaEval
2013 Workshop, Barcelona, Spain, October 18-19 2013.
Runs MAP
[4] V. Lavrenko and W. Bruce Croft. Language Modeling
LIUM 0.4081
for Information Retrieval, chapter Relevance models in
LIMSI 0.4048
information retrieval, pages 11–56. Kluwer Academic
BBC Subtitles 0.3120
Publishers 2003.
”
We see that the subtitles perform the poorest in both task. [5] D. Nadeem, R. Aly, and R. Ordelman. UTwente does
The LIUM transcripts outperform the ones from LIMSI, al- brave new tasks for mediaeval 2012: Searching and
beit with only small differences, which were statistical in- hyperlinking. In MediaEval 2012 Multimedia
significant (p = 0.05). In general, these results are in line Benchmark Workshop, volume 927 of CEUR Workshop
with our previous work [5]. Proceedings. CEUR, 2012.