=Paper=
{{Paper
|id=None
|storemode=property
|title=UWCL at MediaEval 2013: Similar Segments in Social Speech Task
|pdfUrl=https://ceur-ws.org/Vol-1043/mediaeval2013_submission_66.pdf
|volume=Vol-1043
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Levow13
}}
==UWCL at MediaEval 2013: Similar Segments in Social Speech Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1043/mediaeval2013_submission_66.pdf</pdf>
<pre>
                           UWCL at MediaEval 2013:
                    Similar Segments in Social Speech Task

                                                     Gina-Anne Levow
                                                 Department of Linguistics
                                                  University of Washington
                                                Box 352425 Seattle, WA USA
                                                       levow@uw.edu

ABSTRACT                                                         particular topics or events. In addition much of that prior
This paper describes the participation of the University of      work emphasized retrieval from broadcast news sources. Re-
Washington Computational Linguistics Laboratory (UWCL)           trieval from less formal audio sources has focused on voice-
in the Similar Segments in Social Speech task at MediaE-         mail [2] and oral history interviews [5].
val 2013. Participants in this task develop systems that,           The remainder of the paper is organized as follows. Sec-
given a span of speech from a recorded conversation, aim to      tion 2 presents key challenges in the task and UWCL’s ap-
identify all and only highly similar regions in other record-    proach to addressing them. Section 3 describes the exper-
ings. As this was a new task for this year, the goal was         imental configuration, official runs, and results, along with
to establish a baseline and a framework for future experi-       discussion. Section 4 concludes with plans for future work.
mentation. The approach aimed to address two particular
challenges posed by the task: the lack of prior segmentation     2.    CHALLENGES AND APPROACH
of the conversations and the limited material provided by a        The SSSS task posed a wide range of interesting chal-
single brief example segment. To this end, the system em-        lenges. These issues included:
ployed a query-by-example information retrieval framework
using passage retrieval to identify segments dynamically and          • Task modeling: would the task be best modelled as
query expansion to support robust retrieval. Query expan-               retrieval, clustering, ranking, or something else?
sion provided substantial gains when applied to both manual           • Sources of similarity: How should similarity be as-
and automatic transcriptions; results using automatic tran-             sessed —through lexical or acoustic information or some
scripts were competitive with those using manual ones.                  combination?
                                                                      • Segmentation: how should segments be identified —via
                                                                        fixed segmentation, agglomeration, or other means?
1.   INTRODUCTION                                                     • Generalization: given a simple example segment, how
   Recent years have seen a dramatic growth in the use of so-           can we overcome differences across speakers?
cial media as well as the sharing of multimedia materials in          • Transcription: what is the effect of transcription type
venues ranging from Facebook to YouTube. Users increas-                 —manual or automatic —on task effectiveness?
ingly share personal content in these social media settings.
However, flexible search and targeted access to this material       For this challenging new task, UWCL’s approach built on
remains challenging, relying largely on manually assigned        and extended existing methodologies. In particular the ap-
metadata, such as titles and tags, to identify content, rather   proach adopts an information retrieval perspective, using the
than directly indexing the content of these multimedia mate-     text transcriptions of the spoken data. From this perspec-
rials themselves. Furthermore, not only is extraction of con-    tive, system design focused on the latter three issues identi-
tent from multimedia streams more challenging than from          fied above: segmentation, generalization, and transcription.
text, but skimming or browsing in a media stream is slower          Segmentation Much prior work on spoken document re-
and more difficult than in text.                                 trieval has either provided a gold standard segmentation or
   The Similar Segments in Social Speech (SSSS) Task devel-      assumed its existence. In contrast, the SSSS task does not
oped for MediaEval 2013 aims to overcome these limitations       provide a segmentation, and one could imagine different seg-
in information access. As described in the task overview         mentations based on different notions of similarity. Thus,
paper [6], the task requires participating systems to iden-      the strategy aimed to create segments and jump-in points
tify similar spans of speech given an exemplar span. The         sensitive to the similarity measure and to the exemplar seg-
resulting spans can be viewed as jump-in points for listeners    ment. The UWCL system exploits passage retrieval [1] to
searching or browsing through a multi-media stream.              extract overlapping windowed spans within recordings, with
   In contrast to the significant prior work on spoken docu-     fixed length and step in words, that have high similarity with
ment retrieval [3] and topic detection and tracking [7], this    the example. Overlapping and adjacent retrieved passages
task applies a more general and abstract notion of similar-      are merged and receive the rank of the highest ranked con-
ity, rather than focusing on retrieval of documents related to   tributing passage. Based on experimentation on the training
                                                                 corpus, retrieval returned the top 75 passages, which were
                                                                 60 terms in length with a 30 term step.
Copyright is held by the author/owner(s).                           Generalization Differences in lexical choice between ma-
MediaEval 2013 Workshop, October 18-19, 2013, Barcelona, Spain   terials being searched and the searcher’s specification of their
 Name             Trans.   Exp.    Exp. set   NSUR       F       stantially increased effectiveness. With the mismatched ex-
 uwclman          man      no      n.a.        0.57     0.58     pansion corpus, the divergence between manual and auto-
 uwclauto         auto     no      n.a.        0.57     0.58     matic transcription led to a smaller, but still noticeable, im-
 uwclmanexp       man      yes     man         0.82     0.81     provement. Finally, it is interesting to note that, with suit-
 uwclautoexp      auto     yes     man         0.66     0.68     able query expansion, a configuration based on automatic
 uwclauto2exp     auto     yes     auto       0.796     0.80     transcription greatly outperformed one using manual tran-
                                                                 scripts without query expansion and was highly competitive
Table 1: Contrastive official run settings and results           with one using manual transcripts with query expansion.

                                                                 4.   CONCLUSIONS
information need are a well-known issue in information re-
trieval. The segments in the SSSS task, which average about        UWCL’s approach to the MediaEval 2013 SSSS task em-
50 seconds in the training set and about 30 seconds in the       ployed a text-based information retrieval approach, using
test set, are not particularly short. However, it seems likely   passage retrieval to create segments dynamically. Auto-
that variation between speakers and the broad notion of          matic query expansion yielded strong improvements for both
similarity will make lexical match highly challenging. To        manual and automatic transcripts. While these approaches
address this issue, the UWCL system investigates the use         showed promise, many avenues for improvement remain. In
of query expansion [8]. In pseudo-relevance feedback query       addition to tuning retrieval factors, such as passage length
expansion, the original query is used in a preliminary search    and retrieval models, I plan to explore the integration of
pass. The query is then augmented and, one hopes, im-            acoustic, especially acoustic-prosodic, evidence into mea-
proved by adding highly ranked terms from the top-ranked         sures of segment similarity, in addition to the lexical evi-
spans which are presumed to relevant. The resulting query        dence already in use. Such measures could be particularly
is used for final retrieval. In the UWCL system, the training    helpful in recognizing segments with similarity based less on
set data is used to augment the small test set during expan-     topical content than on emotional or attitudinal content.
sion. The procedure used the top five passages retrieved to      Acknowledgments
create a relevance model and selected the ten terms with
highest likelihood under that model for expansion.               Many thanks to the task organizers and also to Steve Renals
   Transcription Both manual and automatic transcripts           for providing the high-quality automatic transcriptions.
of the spoken data are employed.
                                                                 5.   REFERENCES
3.    EXPERIMENTATION                                            [1] J. P. Callan. Passage-level evidence in document
                                                                     retrieval. In Proceedings of the 17th annual ACM
3.1   Experimental Setup                                             SIGIR conference, pages 302–310, 1994.
   The UWCL system employed the INDRI/LEMUR infor-               [2] J. Hirschberg, M. Bacchiani, D. Hindel, P. Isenhour,
mation retrieval engine (http://www.lemurproject.org) for            A. Rosenberg, L. Stark, L. Stead, S. Whittaker, and
indexing and retrieval with default settings [4]. The LEMUR          G. Zamchick. Scanmail: Browsing and searching speech
system provides a sophisticated query language, has built-in         data by content. In Proceedings of EUROSPEECH
support for passage retrieval, and supports pseudo-relevance         2001, 2001.
feedback query expansion. We made use of two different           [3] M. Larson and G. J. F. Jones. Spoken content retrieval:
transcriptions of the conversations: manual transcripts pro-         A survey of techniques and technologies. Foundations
vided by the task organizers and automatic transcripts gen-          and Trends in Information Retrieval, 5(4–5):235–422,
erously provided by the University of Edinburgh. Each con-           2012.
versation was converted to a single TREC-format text doc-        [4] D. Metzler, T. Strohman, and W. B. Croft. Indri at
ument for indexing. For query formulation, the system ex-            TREC 2006: Lessons learned from three terabyte
tracted all tokens in any time-aligned span which overlapped         tracks. In Proceedings of TREC 2006, 2006.
the exemplar segment. These terms were then linked through       [5] D. W. Oard, J. Wang, G. J. Jones, R. W. White,
unweighted combination (the #combine operator). Manual               P. Pecina, D. Soergel, X. Huang, and I. Shafran.
transcriptions were aligned by turn; conversion of automatic         Overview of the CLEF-2006 cross-language speech
transcriptions relied on alignments at the word level.               retrieval track. In CLEF-2006, 2006.
                                                                 [6] N. G. Ward, S. D. Werner, D. G. Novick, E. E.
3.2   Experiment runs and results                                    Shriberg, C. Oertel, L.-P. Morency, and T. Kawahara.
   Five official runs on the test data were submitted and            The similar segments in social speech task. In
scored. As shown in Table 1, contrasting conditions explored         Proceedings of MediaEval 2013, Barcelona, Spain,
the impact of transcription (manual/automatic), query ex-            October 18-19 2013.
pansion (yes/no), and expansion corpus (manual/automatic).       [7] C. Wayne. Multilingual topic detection and tracking:
The official results are also tabulated, for the primary met-        Successful research enabled by corpora and evaluation.
rics, Normalized Searcher Utility Ratio (NSUR) and F-measure,        In Language Resources and Evaluation Conference
as described in the task overview [6].                               (LREC) 2000, pages 1487–1494, 2000.
3.3   Discussion                                                 [8] J. Xu and W. Croft. Query expansion using local and
                                                                     global document analysis. In Proceedings of the 19th
  We find that, although the baseline query formulation              Annual International ACM SIGIR Conference, 1996.
achieves modest effectiveness, query expansion using pseudo-
relevance feedback based on a matched corpus yielded sub-

</pre>