DCU Linking Runs at MediaEval 2013: Search and
                        Hyperlinking Task

                    Shu Chen                         Gareth J. F. Jones                 Noel E. O’Connor
            INSIGHT Centre for Data              CNGL, School of Computing            INSIGHT Centre for Data
               Analytics / CNGL                    Dublin City University                     Analytics
              Dublin City University                 Dublin 9, Ireland                  Dublin City University
                Dublin 9, Ireland               gjones@computing.dcu.ie                   Dublin 9, Ireland
           shu.chen4@mail.dcu.ie                                                    Noel.OConnor@dcu.ie

ABSTRACT                                                          2.2   Hyperlinking using Text Annotation
We describe Dublin City University (DCU)’s participation             This strategy determines the hyperlinks based on two qual-
in the Hyperlinking sub-task of the Search and Hyperlinking       ity measures: the video-level and the segment-level. The
of Television Content task at MediaEval 2013. Two meth-           video-level measure aims to determine the relevance between
ods of video hyperlinking construction are reported: i) using     the video containing the query anchor and other videos con-
spoken data annotation results to achieve the ranked hyper-       taining potential target segments based on the text tran-
link list, ii) linking and merging meaningful named entities      scripts. DBpedia Spotlight1 , implementing text annotation
in video segments to create hyperlinks. The details of algo-      by supervised learning through DBpedia Ontology2 , was
rithm design and evaluation are presented.                        used to extract a set of terms to represent the textual con-
                                                                  tent of each video. The method used to annotate terms in
                                                                  DBpedia Spotlight is based on a TF*ICF model [4], where
Keywords                                                          TF (Term Frequency) represents the relevance of a term in
Hyperlinking, Multimedia Search, Anchor Selection, Infor-         the spoken video, and ICF (Inverse Candidate Frequency) is
mation Retrieval                                                  determined by the relevance of a term in DBpedia Ontology
                                                                  resources [4]. Given the video represented by a set of terms,
                                                                  the similarity score is calculated using a TF-IDF algorithm.
1.    INTRODUCTION
   This paper presents Dublin City University (DCU)’s par-           The segment-level similarity uses Apache Lucene 3.6.23
ticipation in the Hyperlinking sub-task of Search and Hyper-      to determine the relevance between the query anchor and
linking of Television Content task at MediaEval 2013. The         the potential target segments. The Lucene standard an-
paper is organized as follows: Section 2 describes our auto-      alyzer was used with the default stop word list4 to index
matic hyperlinking strategies, Section 3 gives experimental       ASR transcripts and manual subtitles. The search input
results, and Section 4 concludes the paper.                       query contained all the spoken data contained in the query
                                                                  anchor. The score calculation mechanism uses a combina-
                                                                  tion of a Boolean AND function filter and ranking using the
2.    HYPERLINKING STRATEGIES                                     Vector Space Model [3]. The final score used to rank the hy-
                                                                  perlinks was calculated by merging the two results as shown
2.1    Hyperlimking Principles                                    in Equation 1 and Equation 2.
   In this subsection we describe the principles underlying
our approach to the hyperlinking task. The elements in-                              Score = w1 Sv + w2 Sl                 (1)
volved in the hyperlinking framework correspond to the query
anchor, the target segment, and the hyperlink. The query                                 Score = Sv Sl                     (2)
anchors, as the input to the hyperlinking framework, are
                                                                  where Sv is the video-level similarity score, while Sl is the
defined in [1]. A target segment is a subset of a video to
                                                                  segment-level similarity score. We use a simple linear fusion
which a query anchor is supposed to be linked. For our ap-
                                                                  mechanism to merge the two scores, where the weights w1
proach, a fixed window whose duration is 120 seconds and
                                                                  and w2 are set to 0.5 respectively.
the overlap is 30 seconds is used to determine the target
segments. The spoken data in the video is available in three      2.3   Hyperlinking using Named Entities
transcripts: automatic speech recognition (ASR) transcript-
                                                                    This strategy links named entities contained in query an-
s from LIUM Research [6], LIMSI/Vocapia [2] and manual
                                                                  chors and the potential target segments, and then merges
subtitles provided by the BBC [1]. Hyperlinks are construct-
                                                                  these entities to construct hyperlinks. Apache OpenNLP5
ed from the query anchor to a set of target segments using
                                                                  1
different hyperlinking strategies as described in the following     https://github.com/dbpedia-spotlight
                                                                  2
subsections.                                                        http://dbpedia.org/Ontology
                                                                  3
                                                                    http://lucene.apache.org/
                                                                  4
                                                                    https://lucene.apache.org/core/3 6 2/api/core/
Copyright is held by the author/owner(s).                         org/apache/lucene/analysis/StopAnalyzer.html
                                                                  5
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain      http://opennlp.apache.org/
        Topic (Anchor) ID          4        12       21       23        27       31        32       39        43       45
              MAP               0.8921    0.3733   0.1395   0.4925    0.0060   0.4170    0.5713   0.4127    0.1891   0.5555
               P@5              1.000      1.000    0.600   1.0000    0.0000   0.8000    0.8000   1.0000    0.6000   1.0000
              P@10              1.000      0.900    0.600   0.9000    0.0000   0.8000    0.7000   1.0000    0.7000   1.0000
              P@20              1.000      0.700    0.550   0.8500    0.0000   0.6000    0.6000   0.8500    0.4000   0.9000

             Table 1: Mean Average Precision (MAP) and P@N results for different topics in RUN 3


      Run ID         Method               Data       Fuse            value of each run. This indicates that our hyperlinking s-
        1         Text Annotation        M+I+S       Eq.1            trategy based on spoken data annotation performs better.
        2         Text Annotation        M+I+S       Eq.2            Table 1 shows P@N and MAP value of Run 3. MAP and
        3         Text Annotation        M+U+S       Eq.1            P@N benchmark have received a good result in most run-
        4        Named Entities Link     M+L+S       Eq.4            s except Topic (Anchor) 27, which describes Shakespeare
                                                                     and Global Theatre. A total of two other videos are related
Table 2: Run Details (M: Metadata, I: LIMSI, U:                      to Shakespeare and Global Theatre, while the content is p-
LIUM, S: Subtitle, Eq: Equation)                                     resented in terms of a cartoon. The lack of visual elements
                                                                     leads to hyperlinks to cartoon segments, while real users will
        Run ID           1        2         3         4              notice the unrelatedness between TV shows and cartoons.
       MAP value      0.2944   0.2935    0.3109    0.0161
         P@5          0.7000   0.7067    0.7267    0.0600            4.   CONCLUSIONS
         P@10         0.6567   0.6633    0.6567    0.1067
                                                                       This paper presented details of DCU’s participation in
         P@20         0.5450   0.5383    0.5433    0.0733
                                                                     the TV Data Hyperlinking task of MediaEval 2013. The
                                                                     evaluation shows that annotating spoken data to construct
Table 3: Mean Average Precision (MAP) evaluation                     hyperlinks achieves better results. In our future work, we
results                                                              will examine the use of visual cues to improve hyperlinking
                                                                     performance.
was used to tag words in the ASR transcripts and subti-
tles. All noun words tagged as NN, NP, and NNP were                  5.   ACKNOWLEDGEMENT
selected as named entities. To describe and link the named             This work is funded by the European Commission’s Sev-
entities, a vector space model was constructed by predict-           enth Framework Programme (FP7) as part of the AXES
ing the surrounding words given the current word. We use             project (ICT-269980).
word2vec6 to implement a supervised learning mechanism
using a Neural Net Language Model to create the vector               6.   REFERENCES
model of named entities. We use the ASR transcripts of               [1] M. Eskevich, G. J. F. Jones, S. Chen, R. Aly, and
videos gathered from the blip10000 collection [7] as train-              R. Ordelman. Search and Hyperlinking Task at
ing data. The word2vec receives each named entity as input               MediaEval 2013. In MediaEval 2013 Workshop,
and outputs a vector V = {w1 , w2 , ...wk } where wi is a sur-           Barcelona, Spain, 2013.
rounding word of the current entity learned by training data         [2] L. Lamel and J.-L. Gauvain. Speech Processing for
and the vector dimensionality k is set to 50, based on the               Audio Indexing. In Advances in Natural Language
experiment described in [5]. Equation 3 is used to calculate             Processing (LNCS 5221), pages 4–15. 2008.
the score between different word vectors.                            [3] Lucene 3.6.2 Document. Apache Lucene - Scoring.
                                   T
                             2(Vi Vj )                                   https://lucene.apache.org/core/3 6 2/scoring.html,
                        S=                                 (3)
                             |Vi | + |Vj |                               Dec. 2012.
          T                                                          [4] P. N. Mendes, M. Jakob, A. Garcı́a-Silva, and C. Bizer.
where Vi Vj are the total number of words contained in
                                                                         Dbpedia Spotlight: Shedding Light on the Web of
both Vi and Vj . |Vi | is the length of the word vector i. All
                                                                         Documents. In Proceedings of the 7th International
named entities located at the potential target segments are
                                                                         Conference on Semantic Systems, USA, 2011.
merged using Equation 4 to generate the final score to obtain
the ranked hyperlink list.                                           [5] T. Mikolov, K. Chen, G. Corrado, and J. Dean.
                                P                                        Efficient Estimation of Word Representations in Vector
                                   0<i<k i
                                           S                             Space. In Proceedings of Workshop at ICLR, volume
                     Score =                               (4)
                                     N                                   abs/1301.3781, 2013.
where Si is the score of an entity in a potential target seg-        [6] A. Rousseau, F. Bougares, P. Dellsglise, H. Schwenk,
ment, and N is the total number of named entities in the                 and Y. Estssv. LIUM’s systems for the IWSLT 2011
current segment.                                                         Speech Translation Tasks. In Proceedings of IWSLT
                                                                         2011I, 2011.
                                                                     [7] S. Schmiedeke, P. Xu, I. Ferrané, M. Eskevich,
3.     EXPERIMENTAL RESULTS                                              C. Kofler, M. A. Larson, Y. Estève, L. Lamel, G. J. F.
  A total of four formal runs were submitted to the Search               Jones, and T. Sikora. Blip10000: A Social Video
and Hyperlinking task in MediaEval 2013, described in Ta-                Dataset Containing SPUG Content for Tagging and
ble 2. Table 3 shows the Mean Average Precision (MAP)                    Retrieval. In Multimedia Systems Conference 2013,
6
    https://code.google.com/p/word2vec/                                  (MMSys ’13), pages 96–101, 2013.