=Paper= {{Paper |id=None |storemode=property |title=Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013 |pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_90.pdf |volume=Vol-1043 |dblpUrl=https://dblp.org/rec/conf/mediaeval/SchoutenAO13 }} ==Searching and Hyperlinking using Word Importance Segment Boundaries in MediaEval 2013== https://ceur-ws.org/Vol-1043/mediaeval2013_submission_90.pdf
        Searching and Hyperlinking using Word Importance
             Segment Boundaries in MediaEval 2013

                 Kim Schouten                              Robin Aly                   Roeland Ordelman
         Erasmus University Rotterdam                University of Twente              University of Twente
          Rotterdam, the Netherlands              Enschede, the Netherlands         Enschede, the Netherlands
            schouten@ese.eur.nl                    r.aly@ewi.utwente.nl               r.aly@ewi.utwente.nl


ABSTRACT                                                         Our intuition is that term are not only important to the mo-
This paper reports a set of experiments performed in the         ment they were uttered but also to a time window around
context of the Searching and Hyperlinking task of the Medi-      the utterance. We model the posterior probability of a query
aEval Benchmark Initiative 2013. The Searching part chal-        term q being important at time t given its utterance at time
lenges to return a ranked list of video segments that are        t2 as a double sigmoid function:
relevant given some textual user query, while for the Hy-                p(qt = 1|ot ) =
perlinking task the aim is to return a ranked list of video
                                                                                       1                   1                 (1)
segments that are relevant given some video segment. The                                          −           t−e−c
main focus is on finding a way to compute flexible segment                      1 + exp(− t−b+c
                                                                                            s
                                                                                                )   1 + exp(−   s
                                                                                                                    )
boundaries. This is performed by extending the term fre-
                                                                 where
quency part of tf-idf to include the temporal dimension of        t = video time in milliseconds
videos. Although the contribution is theoretically sound its      qt = binary variable of q being important at time t
performance is relatively poor, which we attribute to the fo-     oq = the utterance of term q (milliseconds)
cus on speech data and the hyperlinking process. We plan to       b = the begin time of query term q being uttered
refine our method in the future overcome these limitations.       e = the end time of query term q being uttered
                                                                  c = a correction term to shift the sigmoid to, respec-
1.   INTRODUCTION                                                         tively, the left and the right, such that b < ∀t < e
                                                                          yields a probability close to 1
   The content of videos can be long and current search ap-
                                                                  s = the steepness of the sigmoid modeling the reach
proaches that return whole videos waste the time of users
                                                                          of the importance around oq , this is a multiple of
searching for specific information. The Search and Hyper-
                                                                          idf (q)
linking Tasks in the MediaEval Benchmark Initiative 2013 [3],
                                                                 Given our probabilistic model of the importance of query
which is a refinement of its previous instance [1], models the
                                                                 terms, we now describe how we integrate this model into
situation where users should be directly pointed to relevant
                                                                 the tf-idf ranking scheme. Instead of term frequencies in a
segments within videos and be offered links to video seg-
                                                                 document, we calculate the score of time t as the frequency
ments relevant to the already found video segment. State-
                                                                 of utterances that are important to t times its idf factor.
of-the-art search methods use fixed video segments as a re-
                                                                 However, because we are not certain about the actual impor-
turn unit. However, fixed video segments have the limitation
                                                                 tance qt , we calculate the expected frequency of important
that evidence for a relevant passage can be divided between
                                                                 terms:
two segments or that the segments are too long. Therefore,                      X X                          X      X
in this paper we propose a method to determine segment             score(t) =              idfq E[qt |oq ] =   idfq    p(qt |ot )
boundaries that are suitable for individual queries and to                    q∈Q oq ∈Oq                 q       oq ∈Oq
rank these segments accordingly.                                                                                             (2)
   The paper is outlined as follows. Section 2 describes our
methods for search and hyperlinking video. Section 3 de-         where Q are the query terms and Oq are all occurrence of
scribes the details of our experiment and show the evalua-       term q in the video.
tion results of our submitted runs, and finally we conclude        Based on this ranking scheme, we determine the extent
in Section 4.                                                    of a result segment s as the adjacent t’s that have a score
                                                                 above a certain threshold (set at 0.01 in our experiments).
                                                                 The score of a segment seg is set to be the maximum score(t)
2.   METHOD                                                      for every time point t in seg (we break ties by selecting the
  In the following we describe our methods used for the          shorter segment):
searching and hyperlinking sub-tasks. To this end, we ex-
tend the well-known tf-idf ranking function to incorporate                        score(seg) = max score(t)                  (3)
                                                                                                 t∈seg
the temporal dimension of videos when computing scores.
                                                                 where t iterates over the time interval of the segment. Fig-
                                                                 ure 1 shows an example of our ranking scheme using a term
Copyright is held by the author/owner(s).                        with large idf and a term with low idf with each one utter-
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain   ance, as well as the score(t) function that combines both.
                                        score(t)                         3.3   Discussion
                                        double sigmoid for rare word        The described method shows poor performance. We iden-
                                        double sigmoid for common word
     Score


                                                                         tify the following possible reasons: First, our ranking scheme
                                                                         only uses speech information to find video segments, while
                                                                         their relevance is sometimes determined by their visual con-
                                 Time                                    tent or in the metadata (we found several instances where
                                                                         this is the case). Second, we used only one parameter set-
                                                                         ting, which was the most intuitive to us. We believe other
Figure 1: An example plot of two query keyword                           settings can improve the performance. Finally, for the query
occurrences in a video.                                                  generation for the hyperlinking task, we only used the text
                                                                         uttered during the anchor segments. As some of them are
   For the hyperlinking task, we construct a textual query               relatively short (with a minimum of 10sec), we believe con-
from the given anchor segment. To this end, we consider                  sidering their surrounding can improve their performance.
the words uttered during the anchor segment and select the
ten words with the largest Kullback-Leibner divergence com-              4.    CONCLUSIONS
pared to the language model of the whole collection, loosely                We have described a method to rank video segments based
following the method proposed in [4]. Using the query, we                on the uncertain importance of query terms at particular
used the same ranking scheme as for the search task.                     time in a video given the term’s utterances in the transcript.
                                                                         The time interval where a term is important was described
3.           EXPERIMENTS                                                 by a probability distribution modeled as a double sigmoid
                                                                         function. We proposed that the points where the probabil-
  In this section, we describe the experimental setup and
                                                                         ity that none of the terms is important are intuitive segment
submitted runs for the searching, and hyperlinking sub-tasks.
                                                                         boundaries specific to the current query, which many meth-
For both search queries and constructed hyperlinking queries,
                                                                         ods lack. To rank segments we expanded the standard tf-idf
we return a ranked list of video segments by decreasing score.
                                                                         weighting scheme to the situation where the importance of
3.1          Setup                                                       query terms at a given was uncertain. The final score to
                                                                         rank segments was the maximum expected tf-idf score.
   For an overview of the used dataset please refer to [3].                 The described method showed relatively poor performance.
Transcripts from LIUM are provided in blocks of 20 seconds,              We plan to pursue the following paths to improve the re-
which we chose as the duration of each term. For the manual              sults in the future. First, we plan to extend our scheme to
subtitles and LIMSI transcripts we used the time annotation              incorporate visual content and metadata. Second, we will
for the individual word.                                                 investigate multiple parameter settings to tune our method.
   Our ranking scheme requires two parameters, see Equa-                 Finally, for hyperlinking we only extracted words from the
                            1.0
tion 1. We set c = s ∗ log( 0.01 − 1), which ensures that the            given segment, which we believe may not provide enough in-
sigmoids will yield a value of 0.99 between b and e. This                formation. In the future we plan to include the surrounding
corresponds to the intuition that a term is almost certainly             of the anchor segment and the initial query for which a user
important close to the time it was uttered. For the steepness            arrived at the current segment.
parameter, we use s = 100, 000 ∗ idf (q), causing rare words
to have wider influence.
                                                                         5.    REFERENCES
3.2          Results                                                     [1] M. Eskevich, G. J. F. Jones, S. Chen, R. Aly,
  The results for the searching sub-task are given below in                  R. Ordelman, and M. Larson. Search and Hyperlinking
terms of Mean Reciprocal Rank (MRR), mean Generalized                        Task at MediaEval 2012. In MediaEval 2012 Workshop,
Average Precision (mGAP), and Mean Average Segment                           Pisa, Italy, October 4-5 2012.
Precision (MASP) (cf. [2]).                                              [2] M. Eskevich, W. Magdy, and G. J. F. Jones. New
                                                                             metrics for meaningful evaluation of informally
                 Runs         MRR       mGAP         MASP                    structured speech retrieval. In Proceedings of the 34th
                 LIUM         0.09       0.06         0.06                   European conference on Advances in Information
                LIMSI         0.08       0.04         0.04                   Retrieval, ECIR’12, pages 170–181. Springer Berlin
              BBC Subtitles   0.05       0.04         0.03                   Heidelberg, 2012.
                                                                         [3] Maria Eskevich, Gareth J.F. Jones, Shu Chen, Robin
  For the hyperlinking sub-task, the results are reported in                 Aly, and Roeland Ordelman. The Search and
terms of Mean Average Precision (MAP).                                       Hyperlinking Task at MediaEval 2013. In MediaEval
                                                                             2013 Workshop, Barcelona, Spain, October 18-19 2013.
                          Runs          MAP
                                                                         [4] V. Lavrenko and W. Bruce Croft. Language Modeling
                          LIUM          0.4081
                                                                             for Information Retrieval, chapter Relevance models in
                         LIMSI          0.4048
                                                                             information retrieval, pages 11–56. Kluwer Academic
                       BBC Subtitles    0.3120
                                                                             Publishers 2003.
                                                                                        ”
   We see that the subtitles perform the poorest in both task.           [5] D. Nadeem, R. Aly, and R. Ordelman. UTwente does
The LIUM transcripts outperform the ones from LIMSI, al-                     brave new tasks for mediaeval 2012: Searching and
beit with only small differences, which were statistical in-                 hyperlinking. In MediaEval 2012 Multimedia
significant (p = 0.05). In general, these results are in line                Benchmark Workshop, volume 927 of CEUR Workshop
with our previous work [5].                                                  Proceedings. CEUR, 2012.