=Paper= {{Paper |id=None |storemode=property |title=Out-of-the-box strategy for Rich Speech Retrieval MediaEval 2011 |pdfUrl=https://ceur-ws.org/Vol-807/Alink_spinque_RSR_me11wn.pdf |volume=Vol-807 |dblpUrl=https://dblp.org/rec/conf/mediaeval/AlinkC11 }} ==Out-of-the-box strategy for Rich Speech Retrieval MediaEval 2011 == https://ceur-ws.org/Vol-807/Alink_spinque_RSR_me11wn.pdf
                          Out-of-the-box strategy for
                    Rich Speech Retrieval @ MediaEval 2011

                           Wouter Alink                                       Roberto Cornacchia
                              Spinque                                                 Spinque
                     Utrecht, The Netherlands                                Utrecht, The Netherlands
                     wouter@spinque.com                                      roberto@spinque.com


ABSTRACT                                                          cally translated into a probabilistic relational query language
Evaluation tracks offer valuable opportunities to measure         and executed on top of an SQL database engine.
scientific and technological advances. Spinque approaches           The same framework has also been used to participate in
challenges as the MediaEval Rich Speech Recognition task          other evaluation tracks, such as CLEF-IP [1].
with the additional goal of developing solutions that can
easily transferred from academic labs to industry. The sys-       3.   DESCRIPTION
tem used during this evaluation was obtained with minimal           The speech transcripts were indexed at two levels of gran-
effort and no manual optimisation and yet it provides a rea-      ularity: as whole documents as well as individual Speech-
sonably good baseline to improve upon. More importantly,          Segment sections. We did not use the tags and the video
it is by nature an extensible approach, based on the concept      keyframes provided, nor any other source of evidence.
of declarative search strategies, rather than an ad-hoc search      Our runs can be described as follows:
system.
                                                                  run1 First, all words from title (weight 0.2) and all words
                                                                       from short-title (weight 0.8) are used to search all doc-
1.   INTRODUCTION                                                      uments in the collection. Then, all the SpeechSegment
   Our participation in the MediaEval Rich Speech Recog-               sections within those documents are searched using the
nition task, described in [3], has been inspired by the quest          same keywords. The start of the section is returned as
for finding a simple, fast, robust, and effective approach to          the result. This strategy is depicted in Figure 1.
searching in speech transcripts. We used our generic search
framework to instantiate a specific search solution for this      run2 the same as run1, except that all terms from title get a
task, with the explicit goal of producing reasonable results           weight of 0.0 and all terms from short-title get a weight
in the space of a few hours, including index creation, search          of 1.0. This basically discards the terms from title.
strategy modelling and evaluation. As for example argued
                                                                  run3 the same as run1, except that all terms from title get
in [2], standard textual IR techniques can be applied to
                                                                       a weight of 1.0 and all terms from short-title get a
speech transcripts, even when the transcripts are not per-
                                                                       weight of 0.0. This basically discards the terms from
fect. Our runs focus on textual search with different query
                                                                       short-title. Run3 should be considered as the “required
keyword combinations and with rank refinement at different
                                                                       run”.
levels of retrieval unit granularity.
                                                                     Textual ranking is performed with the BM25 [4] retrieval
2.   SPINQUE FRAMEWORK                                            method, with standard parameters b = 0.75 and k1 = 1.2.
                                                                  The weights 0.2 (words from title) and 0.8 (words from short-
   We modelled and executed our runs as search strategies
                                                                  title) have been found as the local optimum using a hill
within the Spinque framework. This is a prototype environ-
                                                                  climbing approach.
ment where search processes are divided into two phases:
the search strategy definition and the actual search.
   Modelling search strategies in this framework corresponds      4.   RESULTS AND FINDINGS
to designing graph structures, where edges represent data-           The average time for retrieving results for a topic was
flows consisting of terms, documents (e.g. speech-transcripts),   230ms. This time includes “compiling” the search strat-
and document-sections. The nodes connected by such edges          egy (i.e. translating it into SQL queries) out-of-the-box and
are pre-defined, general-purpose operational blocks, that ei-     without manual optimisations, and the overhead for gener-
ther provide source data (the speech transcripts and the top-     ating the run-files. A glitch later found in our indexer may
ics) or modify their input data-flow applying operations such     have altered results marginally: a few documents have not
as extraction of specific sections from documents or ranking      been included in our index and therefore not retrieved.
of sections and documents, to name a few.                            The evaluation scores for the 3 submitted runs are shown
   Search strategies defined in this framework are automati-      in Table 1. Scores have been measured with window sizes of
                                                                  10, 30, and 60 seconds. Overall scores are reasonably sat-
                                                                  isfying for a simple keyword-search approach. As expected,
Copyright is held by the author/owner(s).                         the combination of both the title and the short-title yield
MediaEval 2011 Workshop, September 1-2, 2011, Pisa, Italy         a better result than the individual runs. Best results were
                                                                        Weights for          Window size (seconds)
                                                                    title   short-title      10      30            60
                                                         Run 1       0.2        0.8        0.1320  0.2210     0.2724
                                                         Run 2       0.0        1.0        0.1164  0.1816     0.2231
                                                         Run 3       1.0        0.0        0.1054  0.1630     0.1968

                                                        Table 1: mGAP scores for the runs on the test-set
                                                        with 50 topics (step size is 10 seconds)


                                                        found on the test-set assigning a larger weight to short-title
                                                        keywords, which suggests that full titles may carry off-topic
                                                        words which yield lower precision.
                                                           We found that searching short sections produced disap-
                                                        pointing rankings, probably due to a non fine-tuned document-
                                                        length normalisation. Both parameter configurations used
                                                        (for BM25 and for the title / short-title keyword mixture)
                                                        could be improved with a more exhaustive exploration of
                                                        their search space. The simplicity of the strategies used and
                                                        the small size of the corpus at hand would make this ap-
                                                        proach feasible indeed, which is not the case in general.
                                                           One more direction for possible improvements is to ex-
                                                        periment with a more fine-grained zooming in, with search
                                                        windows of e.g. entire documents followed by 10 minute, 1
                                                        minute and 5 seconds speeches. Such a multi-stage strat-
                                                        egy would likely retain recall and improve precision at every
                                                        iteration.

                                                        5.   CONCLUSIONS
                                                           The main contribution of this paper is to show how a
                                                        specific search engine for speech transcripts of reasonable
                                                        quality can be instantiated with minimal effort. While out-
                                                        of-the-box text search is not unique to Spinque’s framework,
                                                        the ability to play with retrieval units of different granular-
                                                        ities and combine query and/or data sources easily is not
                                                        common.
                                                           We plan to improve on our first speech retrieval evalua-
                                                        tion in two ways: firstly, by automating as much as possible
                                                        the optimisation of search strategies’ free parameters, in-
                                                        cluding the choice of unit retrieval granularities; secondly,
                                                        by building on top of this optimised baseline with the addi-
                                                        tion of more sources of evidence that may be available (such
                                                        as tags and video material).

                                                        6.   REFERENCES
                                                        [1] W. Alink, R. Cornacchia, and A.P. de Vries. Searching
                                                            clef-ip by strategy. In CLEF 2009, Revised Selected
                                                            Papers, Part I. Springer, 2010.
                                                        [2] James Allan. Perspectives on information retrieval and
Figure 1: Search strategy using both title and short-       speech. In Information Retrieval Techniques for Speech
title as input, first searching the whole transcript        Applications, volume 2273 of Lecture Notes in
documents, then refining into sections.                     Computer Science, pages 323–326. Springer Berlin /
                                                            Heidelberg, 2002.
                                                        [3] M. Larson, M. Eskevich, R. Ordelman, C. Kofler,
                                                            S. Schmiedeke, and G.J.F. Jones. Overview of
                                                            MediaEval 2011 Rich Speech Retrieval Task and Genre
                                                            Tagging Task. In MediaEval 2011 Workshop, Pisa,
                                                            Italy, September 1-2 2011.
                                                        [4] S.E. Robertson, S. Walker, S. Jones,
                                                            M. Hancock-Beaulieu, and M. Gatford. Okapi at
                                                            TREC-3. In Third Text REtrieval Conference (TREC
                                                            1994), 1994.