<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UWCL at MediaEval 2013: Similar Segments in Social Speech Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gina-Anne Levow</string-name>
          <email>levow@uw.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Linguistics University of Washington</institution>
          <addr-line>Box 352425 Seattle, WA</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>This paper describes the participation of the University of Washington Computational Linguistics Laboratory (UWCL) in the Similar Segments in Social Speech task at MediaEval 2013. Participants in this task develop systems that, given a span of speech from a recorded conversation, aim to identify all and only highly similar regions in other recordings. As this was a new task for this year, the goal was to establish a baseline and a framework for future experimentation. The approach aimed to address two particular challenges posed by the task: the lack of prior segmentation of the conversations and the limited material provided by a single brief example segment. To this end, the system employed a query-by-example information retrieval framework using passage retrieval to identify segments dynamically and query expansion to support robust retrieval. Query expansion provided substantial gains when applied to both manual and automatic transcriptions; results using automatic transcripts were competitive with those using manual ones.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>Recent years have seen a dramatic growth in the use of
social media as well as the sharing of multimedia materials in
venues ranging from Facebook to YouTube. Users
increasingly share personal content in these social media settings.
However, exible search and targeted access to this material
remains challenging, relying largely on manually assigned
metadata, such as titles and tags, to identify content, rather
than directly indexing the content of these multimedia
materials themselves. Furthermore, not only is extraction of
content from multimedia streams more challenging than from
text, but skimming or browsing in a media stream is slower
and more di cult than in text.</p>
      <p>
        The Similar Segments in Social Speech (SSSS) Task
developed for MediaEval 2013 aims to overcome these limitations
in information access. As described in the task overview
paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the task requires participating systems to
identify similar spans of speech given an exemplar span. The
resulting spans can be viewed as jump-in points for listeners
searching or browsing through a multi-media stream.
      </p>
      <p>
        In contrast to the signi cant prior work on spoken
document retrieval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and topic detection and tracking [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], this
task applies a more general and abstract notion of
similarity, rather than focusing on retrieval of documents related to
particular topics or events. In addition much of that prior
work emphasized retrieval from broadcast news sources.
Retrieval from less formal audio sources has focused on
voicemail [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and oral history interviews [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The remainder of the paper is organized as follows.
Section 2 presents key challenges in the task and UWCL's
approach to addressing them. Section 3 describes the
experimental con guration, o cial runs, and results, along with
discussion. Section 4 concludes with plans for future work.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>CHALLENGES AND APPROACH</title>
      <p>The SSSS task posed a wide range of interesting
challenges. These issues included:</p>
      <p>Task modeling: would the task be best modelled as
retrieval, clustering, ranking, or something else?
Sources of similarity: How should similarity be
assessed |through lexical or acoustic information or some
combination?
Segmentation: how should segments be identi ed |via
xed segmentation, agglomeration, or other means?
Generalization: given a simple example segment, how
can we overcome di erences across speakers?
Transcription: what is the e ect of transcription type
|manual or automatic |on task e ectiveness?
For this challenging new task, UWCL's approach built on
and extended existing methodologies. In particular the
approach adopts an information retrieval perspective, using the
text transcriptions of the spoken data. From this
perspective, system design focused on the latter three issues
identied above: segmentation, generalization, and transcription.</p>
      <p>
        Segmentation Much prior work on spoken document
retrieval has either provided a gold standard segmentation or
assumed its existence. In contrast, the SSSS task does not
provide a segmentation, and one could imagine di erent
segmentations based on di erent notions of similarity. Thus,
the strategy aimed to create segments and jump-in points
sensitive to the similarity measure and to the exemplar
segment. The UWCL system exploits passage retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to
extract overlapping windowed spans within recordings, with
xed length and step in words, that have high similarity with
the example. Overlapping and adjacent retrieved passages
are merged and receive the rank of the highest ranked
contributing passage. Based on experimentation on the training
corpus, retrieval returned the top 75 passages, which were
60 terms in length with a 30 term step.
      </p>
      <p>Generalization Di erences in lexical choice between
materials being searched and the searcher's speci cation of their
uwclman
uwclauto
uwclmanexp
uwclautoexp
uwclauto2exp
man
auto
man
auto
auto
no
no
yes
yes
yes
n.a.
n.a.
man
man
auto</p>
      <p>
        Exp. set
information need are a well-known issue in information
retrieval. The segments in the SSSS task, which average about
50 seconds in the training set and about 30 seconds in the
test set, are not particularly short. However, it seems likely
that variation between speakers and the broad notion of
similarity will make lexical match highly challenging. To
address this issue, the UWCL system investigates the use
of query expansion [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In pseudo-relevance feedback query
expansion, the original query is used in a preliminary search
pass. The query is then augmented and, one hopes,
improved by adding highly ranked terms from the top-ranked
spans which are presumed to relevant. The resulting query
is used for nal retrieval. In the UWCL system, the training
set data is used to augment the small test set during
expansion. The procedure used the top ve passages retrieved to
create a relevance model and selected the ten terms with
highest likelihood under that model for expansion.
      </p>
      <p>Transcription Both manual and automatic transcripts
of the spoken data are employed.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTATION</title>
    </sec>
    <sec id="sec-4">
      <title>Experimental Setup</title>
      <p>
        The UWCL system employed the INDRI/LEMUR
information retrieval engine (http://www.lemurproject.org) for
indexing and retrieval with default settings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The LEMUR
system provides a sophisticated query language, has built-in
support for passage retrieval, and supports pseudo-relevance
feedback query expansion. We made use of two di erent
transcriptions of the conversations: manual transcripts
provided by the task organizers and automatic transcripts
generously provided by the University of Edinburgh. Each
conversation was converted to a single TREC-format text
document for indexing. For query formulation, the system
extracted all tokens in any time-aligned span which overlapped
the exemplar segment. These terms were then linked through
unweighted combination (the #combine operator). Manual
transcriptions were aligned by turn; conversion of automatic
transcriptions relied on alignments at the word level.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Experiment runs and results</title>
      <p>
        Five o cial runs on the test data were submitted and
scored. As shown in Table 1, contrasting conditions explored
the impact of transcription (manual/automatic), query
expansion (yes/no), and expansion corpus (manual/automatic).
The o cial results are also tabulated, for the primary
metrics, Normalized Searcher Utility Ratio (NSUR) and F-measure,
as described in the task overview [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3.3
      </p>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>We nd that, although the baseline query formulation
achieves modest e ectiveness, query expansion using
pseudorelevance feedback based on a matched corpus yielded
substantially increased e ectiveness. With the mismatched
expansion corpus, the divergence between manual and
automatic transcription led to a smaller, but still noticeable,
improvement. Finally, it is interesting to note that, with
suitable query expansion, a con guration based on automatic
transcription greatly outperformed one using manual
transcripts without query expansion and was highly competitive
with one using manual transcripts with query expansion.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS</title>
      <p>UWCL's approach to the MediaEval 2013 SSSS task
employed a text-based information retrieval approach, using
passage retrieval to create segments dynamically.
Automatic query expansion yielded strong improvements for both
manual and automatic transcripts. While these approaches
showed promise, many avenues for improvement remain. In
addition to tuning retrieval factors, such as passage length
and retrieval models, I plan to explore the integration of
acoustic, especially acoustic-prosodic, evidence into
measures of segment similarity, in addition to the lexical
evidence already in use. Such measures could be particularly
helpful in recognizing segments with similarity based less on
topical content than on emotional or attitudinal content.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Many thanks to the task organizers and also to Steve Renals
for providing the high-quality automatic transcriptions.
5.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Callan</surname>
          </string-name>
          .
          <article-title>Passage-level evidence in document retrieval</article-title>
          .
          <source>In Proceedings of the 17th annual ACM SIGIR conference</source>
          , pages
          <volume>302</volume>
          {
          <fpage>310</fpage>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hirschberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bacchiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hindel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isenhour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whittaker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Zamchick</surname>
          </string-name>
          . Scanmail:
          <article-title>Browsing and searching speech data by content</article-title>
          .
          <source>In Proceedings of EUROSPEECH</source>
          <year>2001</year>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Larson</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Spoken content retrieval: A survey of techniques and technologies</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          {5):
          <volume>235</volume>
          {
          <fpage>422</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Metzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Strohman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. B.</given-names>
            <surname>Croft</surname>
          </string-name>
          . Indri at TREC 2006:
          <article-title>Lessons learned from three terabyte tracks</article-title>
          .
          <source>In Proceedings of TREC</source>
          <year>2006</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Oard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. W.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pecina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Soergel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Shafran.</surname>
          </string-name>
          <article-title>Overview of the CLEF-2006 cross-language speech retrieval track</article-title>
          .
          <source>In CLEF-2006</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N. G.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Werner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Novick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Shriberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Oertel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Morency</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kawahara</surname>
          </string-name>
          .
          <article-title>The similar segments in social speech task</article-title>
          .
          <source>In Proceedings of MediaEval</source>
          <year>2013</year>
          , Barcelona, Spain, October
          <volume>18</volume>
          -19
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wayne</surname>
          </string-name>
          .
          <article-title>Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation</article-title>
          .
          <source>In Language Resources and Evaluation Conference (LREC)</source>
          <year>2000</year>
          , pages
          <fpage>1487</fpage>
          {
          <fpage>1494</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Query expansion using local and global document analysis</article-title>
          .
          <source>In Proceedings of the 19th Annual International ACM SIGIR Conference</source>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>