<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LACS 6 ystem $nalysis on 5 MediaEval 2014 Search and Hyperlinking Task etrieval 0 odels for the</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Justin Chiu</string-name>
          <email>Jchiu1@andrew.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Rudnicky</string-name>
          <email>Alex.Rudnicky@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Language Technologies Institute, School of Computer Science, Carnegie Mellon University</institution>
          ,
          <addr-line>Pittsburgh</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe the LACS submission to the Search sub-task of the Search and Hyperlinking Task at MediaEval 2014. Our experiments investigate how different retrieval models interact with word stemming and stopword removal. On the development data, we segment the subtitle and Automatic Speech Recognition (ASR) transcripts into fixed length time units, and examine the effect of different retrieval models. We find that stemming provides consistent improvement; stopword removal is more sensitive to the retrieval models on the subtitles. These manipulations do not contribute to stable improvement on the ASR transcripts. Our experiments on test data focus on the subtitle. The gap in performance for different retrieval models is much less compared to the development data. We achieved 0.477 MAP on the test data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The amount and variety of multimedia data available online is
rapidly increasing. As a result, the techniques for identifying
content relevant to a query need to improve, to effectively process
large multimedia data collections. There are existing works
utilizing multi-modality for multimedia retrieval [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; the ASR
transcript is part of the multi-modality, which is similar to the
Speech Retrieval framework. However, we believe there is more
to be discovered on the Speech Retrieval part, especially the
interaction between retrieval models and ASR transcripts quality.
Established retrieval models are commonly used for the text
retrieval. Applying the retrieval model to ASR transcripts is a
standard approach for Speech Retrieval. However, there are
fundamental differences between text documents and spoken
documents, and different retrieval model might have different
characteristics that can be beneficial, or harmful, for retrieval
performance. Specifically, we examine word stemming and
stopword removal, techniques that have been shown to be helpful
in text retrieval. Can these techniques also help in speech
retrieval? This question is the basis for our experiments1. We
carried out two different sets of experiments on the development
data to examine the difference between subtitle and ASR
transcript. Each set of experiments investigates the effectiveness
of different retrieval models and processing techniques. Due to
the time constraint, we only submitted experiments on subtitle test
data. We find that the performance gap observed on development
data does not show up in the test data.
      </p>
      <p>Copyright is held by the author/owner(s).</p>
    </sec>
    <sec id="sec-2">
      <title>3. EXPERIMENTS ON DEV DATA</title>
      <p>We first present our results on the development (dev) data,
reporting the Mean Reciprocal Rank (MRR). The dev experiment
is known item retrieval. The parameter for Okapi retrieval models
is k1= 1.2, b= 0.75 and k3 = 7, and the μ for LM is 2500.
From Table 1 and 2, we can observe the interaction between
different processing and retrieval models. Stemming and stopword
removal provides persistent improvement on subtitles. On the
other hand, for the ASR transcript, these appear unstable. Aside
from the difference due to recognition errors, one possible factor
contributing to this phenomenon is the size of vocabulary. The
vocabulary size for the subtitle is 251506, while the vocabulary
size for the ASR transcription is 83094, one-third of subtitle
vocabulary. The lack of vocabulary, combining with stemming or
stopword removal, can potentially decimate words in the
transcript, hence harm the retrieval result. Another phenomenon
we observed are a significant performance gap between different
retrieval models. TF-IDF retrieval model outperforms LM and
Okapi retrieval models, which was unexpected. Since the dev data
is a known item retrieval task (For each query, there is only 1
matching speech segment), we suspect that dev data might have
some bias in favor of the TF-IDF retrieval model. Another
possible factor for superior performance on the TF-IDF retrieval
model is smoothing. Both LM and Okapi retrieval model relies on
smoothing parameters, but there is no smoothing for the TF-IDF
retrieval model. If the data have a good number of exact matching
between query and documents, TF-IDF may outperform other
retrieval models due to the absence of smoothing.</p>
    </sec>
    <sec id="sec-3">
      <title>4. EXPERIMENTS ON TEST DATA</title>
      <p>The experiments on test data are ad-hoc retrieval task, which is no
longer restricted to one result per query. Due to the time
constraint, we only submitted systems based on subtitle data. Our
submissions use both word stemming and stopword removal, as
this setup gave the most promising result on the dev data. Results
on test data are in Table 3.
MAP</p>
      <p>TF-IDF
0.477
0.747
0.673
0.578
The performance gap between retrieval models is much smaller
compare to dev data. Yet the trend is still the same: TF-IDF gives
the best performance compared to other retrieval models. We
suspect that the absence of smoothing contributes, and can explain
the superior performance on TF-IDF. In a regular retrieval task,
TF-IDF is not expected to outperform Okapi and LM consistently.
While processing the experiments on the test data, we noticed a
difference between dev and test queries. The number of words in
dev queries was greater than for test queries. Originally we
thought that this might be a factor affecting performance on
different retrieval models, but it does not appear to be an issue.
Still, we suggest the characteristics of queries in dev and test data
should be more consistent, so that the datasets are better matched.</p>
    </sec>
    <sec id="sec-4">
      <title>5. ANALYSIS</title>
      <p>We find that the TF-IDF retrieval model is the best of the three
models tested. We believe this is because it does not do
smoothing. However, generally speaking, smoothing can provide
significant improvement on the standard retrieval task. We
conducted experiments with LM retrieval model without
smoothing; the resulting MAP on dev data is less than 0.05. So
we can only assume that TF-IDF retrieval model could possibly
find the correct way for processing the absent query word on our
data. The possible reason for the performance gap on dev data is
query text length. TF-IDF (which relies on exact word matching)
is stronger than the other approaches. The test data has much
shorter query length, so the gap is not as great as we observed on
the dev data.</p>
      <p>
        Research in the Spoken Term Detection community suggests
using context for improving retrieval performance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or using
retrieval system fusion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We did not complete our experiments
on using context for improving retrieval performance, but we tried
system fusion approaches with our 3 retrieval models. The
resulting system usually has the performance that’s between the 2
fused systems. We conjecture that the three retrieval models we
used in this work is in generally similar with each other, and
fusion is not helpful due to lack of complementary.
      </p>
    </sec>
    <sec id="sec-5">
      <title>6. CONCLUSION</title>
      <p>We examined how different retrieval models interact with
different text processing techniques such as word stemming and
stopword removal, on subtitles and ASR transcript, two different
forms on the dev data. We find that stemming and stopword
removal can provide persistent improvement on the subtitle data,
yet for the ASR transcript, these processing mostly harm
performance except for stemming on the TF-IDF retrieval model.
The result on test data shows that the difference on retrieval
methods is not that significant when the retrieval task contains
more possible targets. TF-IDF still has the best performance,
which we believe is due to the absence of smoothing technique.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rudnicky</surname>
          </string-name>
          .
          <article-title>Using Conversational Word Burst in Spoken Term Detection</article-title>
          .
          <source>In Proc. of Interspeech</source>
          <year>2013</year>
          . Lyon, France,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Trmal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Povey</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Chen, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rudnicky</surname>
          </string-name>
          .
          <article-title>Combination of FST and CN Search in Spoken Term Detection</article-title>
          .
          <source>In Proc. of Interspeech 2014. Singapore</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Conver</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Thomas</surname>
          </string-name>
          .
          <source>Elements of Information Theory</source>
          .
          <year>1991</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Racca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ordelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>The Search and Hyperlinking Task at MediaEval 2014</article-title>
          .
          <source>In Proc. of the MediaEval 2014 Multimedia Benchmark Workshop</source>
          . Barcelona,
          <year>Spain 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eskevich</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>Time-based Segmentation and Use of Jump-in Points in DCU Search Runs at the Search and Hyperlinking Task at MediaEval 2013</article-title>
          .
          <source>In Proc of the MediaEval 2013 Multimedia Benchmark Workshop</source>
          . Barcelona, Spain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Gauvain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lamel</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Adda.</surname>
          </string-name>
          <article-title>The LIMSI broadcast news transcription system</article-title>
          .
          <source>Speech Communication 37, page 89-108</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mitamura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Hauptmann. ZeroExample Event</surname>
          </string-name>
          <article-title>Search using MultiModal Pseudo Relevance Feedback</article-title>
          .
          <source>In Proc. of International Conference on Multimedia Retrieval, page 297. ACM</source>
          , Glasgow, UK,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krovetz</surname>
          </string-name>
          .
          <article-title>Viewing morphology as an inference process</article-title>
          .
          <source>In Proc. of SIGIR'93, page 191-202</source>
          , Pittsburgh, USA,
          <year>1993</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boughamen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J. F.</given-names>
            <surname>Jones</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sparck-Jones</surname>
          </string-name>
          .
          <article-title>Okapi at TREC-6: Automatic adhoc, VLC, routing, filtering and QSDR</article-title>
          .
          <source>In Proc. of Text Retrieval Conference (TREC-6)</source>
          , pages
          <fpage>125</fpage>
          -
          <lpage>136</lpage>
          ,
          <year>1998</year>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhai</surname>
          </string-name>
          . Notes on Lemur TFIDF model http://www.cs.cmu.edu/~lemur/tfidf.ps
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>