<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Model Fusion Experiments for the Cross Language Speech Retrieval Task at CLEF 2007</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muath Alzghool</string-name>
          <email>alzghool@site.uottawa.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diana Inkpen</string-name>
          <email>diana@site.uottawa.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Technology and Engineering University of Ottawa</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2007. We present the results of the submitted runs for the English collection. We have used two Information Retrieval systems in our experiments: SMART and Terrier, with two query expansion techniques: one based on a thesaurus and the second one based on blind relevant feedback. We proposed two novel data fusion methods for merging the results of several models (retrieval schemes available in SMART and Terrier). Our experiments showed that the combination of query expansion methods and data fusion methods helps to improve the retrieval performance. We also present cross-language experiments, where the queries are automatically translated by combining the results of several online machine translation tools. Experiments on indexing the manual summaries and keywords gave the best retrieval results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Fusion</kwd>
        <kwd>Retrieval Models</kwd>
        <kwd>Query Expansion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>This paper presents the third participation of the University of Ottawa group in the Cross-Language Speech
Retrieval (CL-SR) track, at CLEF 2007. We present our systems, followed by results for the submitted runs for
the English collection. We present results for many additional runs for the English collection. We experimented
with many possible weighting schemes for indexing the documents and the queries, and with several query
expansion techniques. Several researchers in the literature have explored the idea of combining the results of
different retrieval strategies, different document representations and different query representations; the
motivation is that each technique will retrieve different sets of relevant documents; therefore combining the
results could produce a better result than any of the individual techniques. We propose new data fusion
techniques for combining the results of different Information Retrieval (IR) schemes. We applied our data fusion
techniques to monolingual settings and to cross-language settings where the queries are automatically translated
from French and Spanish into English by combining the results of several online machine translation (MT) tools.
At the end we present the best results, when manual summaries and manual keywords were indexed.</p>
    </sec>
    <sec id="sec-2">
      <title>2 System Description</title>
      <p>
        The University of Ottawa Cross-Language Information Retrieval systems were built with off-the-shelf
components. For the retrieval part, the SMART [
        <xref ref-type="bibr" rid="ref11 ref3">3, 11</xref>
        ] IR system and the Terrier [
        <xref ref-type="bibr" rid="ref10 ref2">2, 10</xref>
        ] IR system were tested
with many different weighting schemes for indexing the collection and the queries.
      </p>
      <p>
        SMART was originally developed at Cornell University in the 1960s. SMART is based on the vector space
model of information retrieval. We used nnn.ntn, ntn.ntn, lnn.ntn, ann.ntn, ltn.ntn, atn.ntn, ntn.nnn, nnc.ntc,
ntc.ntc, ntc.nnc, lnc.ntc, anc.ntc, ltc.ntc, atc.ntc weighting schemes [
        <xref ref-type="bibr" rid="ref11 ref3">3 ,11</xref>
        ]; lnn.ntn performs very well in
CLEFCLSR 2005 and 2006 [
        <xref ref-type="bibr" rid="ref1 ref6">6,1</xref>
        ] .
      </p>
      <p>
        Terrier was originally developed at University of Glasgow. It is based on Divergence from Randomness
models (DFR) where IR is seen as a probabilistic process [
        <xref ref-type="bibr" rid="ref10 ref2">2, 10</xref>
        ]. We experimented with the In(exp)C2
weighting model, one of Terrier’s DFR-based document weighting models.
      </p>
      <p>
        For translating the queries from French and Spanish into English, several free online machine translation
tools were used. The idea behind using multiple translations is that they might provide more variety of words and
phrases, therefore improving the retrieval performance. Seven online MT systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] were used for translating
from Spanish and from French into English. We combined the outputs of the MT systems by simply
concatenating all the translations. All seven translations of a title made the title of the translated query; the same
was done for the description and narrative fields. We used the combined topics for all the cross-language
experiments reported in this paper.
      </p>
      <p>
        We have used two query expansion methods. The first one is based on the Shoah Visual History Foundation
thesaurus provided with the Mallach collection; our method adds two items and their alternatives (synonyms)
from the thesaurus, based on the similarity between the thesaurus terms and the title field for each topic. More
specifically, to select two items from the thesaurus, we used SMART with the title of each topic as query and the
thesaurus terms as documents, using the weighting scheme lnn.ntn. After computing the similarity, the top two
thesaurus terms were added to the topic; for these terms all the alternative terms was also added to the topic. For
example, in topic 3005, the title is “Death marches”, and the most similar terms from the thesaurus are “death
marches” and “deaths during forced marches”; the alternative terms for theses terms are “death march” and
“Todesmärsche”. Table 1 shows two entries from the thesaurus; each entry contains six types of fields: name ̶
contains a unique numeric code for each entry, label ̶ a phrase or word which represents the entry, alt-label ̶
contains the alternative phrase or the synonym for the entry, usage ̶ contains the usage or the definition of the
entry. There are two more relations in the thesaurus: is-a and of-type, which contain the numeric code of the
entry involved in the relation. The second query expansion method extracts the most informative terms from the
top-returned documents as the expanded query terms. In this expansion process, 12 terms from the returned
documents (the top 15 documents) were added to the topic, based on Bose-Einstein 1 model (Bo1) [
        <xref ref-type="bibr" rid="ref10 ref4">4,10</xref>
        ]; we
have put a restriction on the new terms: their document frequency must be less than the maximum document
frequency in the title of the topic. The aim of this restriction is avoid more-general terms being added to the
topic. Any term that satisfies this restriction will be a part of the new topic. We have also up weighted the title
terms five times higher than the other terms in the topic.
For the data fusion part, we proposed two methods that use the sum of normalized weighted similarity scores of
15 different IR schemes as shown in the following formulas :
      </p>
      <p>Fusion1 =
Fusion2 =</p>
      <p>∑[Wr4 (i) + WM3AP (i)] ∗ NormSimi
i∈IR schems</p>
      <p>
        ∑Wr4 (i) *WM3AP (i) ∗ NormSimi
i∈IR schems
(1)
(2)
where Wr(i) and WMAP(i) are experimentally determined weights based on the recall (the number of relevant
documents retrieved) and precision (MAP score) values for each IR scheme computed on the training data. For
example, suppose that two retrieval runs r1 and r2 give 0.3 and 0.2 (respectively) as MAP scores on training
data; we normalize these scores by dividing them by the maximum MAP value: then WMAP(r1) is 1 and WMAP(r2)
is 0.66 (then we compute the power 3 of these weights, so that one weight stays 1 and the other one decreases;
we chose power 3 for MAP score and power 4 for recall, because the MAP is more important than the recall).
We hope that when we multiply the similarity values with the weights and take the summation over all the runs,
the performance of the combined run will improve. NormSimi is the normalized similarity for each IR scheme.
We did the normalization by dividing the similarity by the maximum similarity in the run. The normalization is
necessary because different weighting schemes will generate different range of similarity values, so a
normalization method should applied to each run. Our method is differed than the work done by Fox and Shaw
in 1994 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Lee in 1995 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; they combined the results by taking the summation of the similarity scores
without giving any weight to each run. In our work we weight each run according to the precision and recall on
the training data.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Experimental Results</title>
      <sec id="sec-3-1">
        <title>3.1 Submitted Runs</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Comparison of Systems and Query Expansion Methods</title>
        <p>In order to compare between different methods of query expansion and a base run without query expansion, we
selected the base run with the weighting scheme lnn.ntn, topic fields title and description, and document fields
ASRTEXT2004A, AUTOKEYWORD2004A1, and AUTOKEYWORD2004A2. We used the two techniques for query
expansion, one based on the thesaurus and the other one on blind relevance feedback (denoted Bo1 in Table 3).
We present the results (MAP scores) with and without query expansion, and with the combination of both query
expansion methods, on the test and training topics. According to Table 3, we note that both methods help to
improve the retrieval results, but the improvement is not significant on the training and test data; also the
combination of the two methods helps to improve the MAP score on the training data (not significantly), but not
on the test data.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Experiments using Data Fusion</title>
        <p>We applied the data fusion methods described in section 2 to 14 runs produced by SMART and one run
produced by Terrier; all runs was produced using a combination of the two methods of query expansion as
described in section 2. Performance results for each single run and fused runs are presented in Table 4, in which
% change is given with respect to the run providing better effectiveness in each combination on the training data.
The Manual English column represents the results when only the manual keywords and the manual summaries
were used for indexing the documents using English topics, the Auto-English column represents the results when
automatic fields are indexed from the documents (ASRTEXT2004A, and AUTOKEYWORD2004A1, A2) using
English topics. For cross-languages experiments the results are represented in the columns Auto-French, and
Auto-Spanish.</p>
        <p>Data fusion helps to improve the performance (MAP score) on the test data The best improvement using data
fusion (Fusion1) was on the French cross-language experiments with 21.7%, which is statistically significant
while on monolingual the improvement was only 6.5% which is not significant. Also, there is an improvement in
the number of relevant documents retrieved (recall) for all the experiments, except Auto-French on the test data,
as shown in Table 5. We computed these improvements relative to the results of the best single-model run, as
measured on the training data. This supports our claim that data fusion improves the recall by bringing some new
documents that were not retrieved by all the runs. On the training data, the Fusion2 method gives better results
than Fusion1 for all cases except on Manual English, but on the test data Fusion1 is better than Fusion2. In
general, the data fusion seems to help, because the performance on the test data in not always good for weighting
schemes that obtain good results on the training data, but combining models allows the best-performing
weighting schemes to be taken into consideration.</p>
        <p>The retrieval results for the translations from French were very close to the monolingual English results,
especially on the training data, but on the test data the difference was significantly worse. For Spanish, the
difference was significantly worse on the training data, but not on the test data.</p>
        <p>Experiments on manual keywords and manual summaries showed high improvements, the MAP score jumped
from 0.0855 to 0.2761 on the test data.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Conclusion</title>
      <p>We experimented with two different systems: Terrier and SMART, with combining the various weighting
schemes for indexing the document and query terms. We proposed two approaches for query expansion, one
based on the thesaurus and another one based on blind relevance feedback. The combination of the query
expansion methods obtained a small improvement on the training and test data (not statistically significant
according to a Wilcoxon signed test).</p>
      <p>Our focus this year was on data fusion: we proposed two methods to combine different weighting scheme
from different systems, based on weighted summation of normalized similarity measures; the weight for each
scheme was based on the relative precision and recall on the training data. Data fusion helps to improve the
retrieval significantly for some experiments (Auto-French) and for other not significantly (Manual English).</p>
      <p>The idea of using multiple translations proved to be good. More variety in the translations would be
beneficial. The online MT systems that we used are rule-based systems. Adding translations by statistical MT
tools might help, since they could produce radically different translations.</p>
      <p>Combining query expansion methods and data fusion helped to improve the retrieval significantly comparing
to the median and average of all required runs submitted by all the teams that participated in the track.</p>
      <p>In future work we plan to investigate more methods of data fusion, removing or correcting some of the speech
recognition errors in the ASR content words, and to use speech lattices for indexing.
-0.7%
1759
1736</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alzghool</surname>
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Inkpen D.</surname>
          </string-name>
          :
          <article-title>Experiments for the Cross Language Speech Retrieval Task at CLEF 2006</article-title>
          .
          <source>In Proceedings of CLEF 2006, Lecture Notes in Computer Science</source>
          , Springer-Verlag 4730,
          <year>2007</year>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>785</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amati</surname>
            , G. and van Rijsbergen,
            <given-names>C. J.</given-names>
          </string-name>
          :
          <article-title>Probabilistic models of information retrieval based on measuring the divergence from randomness</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          , Vol.
          <volume>20</volume>
          , No. 4,
          <string-name>
            <surname>October</surname>
          </string-name>
          (
          <year>2002</year>
          )
          <fpage>357</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Buckley</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          , and Allan J.:
          <article-title>Automatic retrieval with locality information using SMART</article-title>
          .
          <source>In Text REtrieval Conference (TREC-1)</source>
          ,
          <source>March</source>
          (
          <year>1993</year>
          )
          <fpage>59</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carpineto</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Mori</surname>
            <given-names>R.</given-names>
          </string-name>
          , Romano G., and
          <string-name>
            <surname>Bigi</surname>
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>An information-theoretic approach to automatic query expansion</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          , Vol.
          <volume>19</volume>
          , No. 1,
          <string-name>
            <surname>January</surname>
          </string-name>
          (
          <year>2001</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Shaw</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          (
          <year>1994</year>
          ).
          <article-title>Combination of multiple searches</article-title>
          .
          <source>Proceedings of the Third Text REtrieval Conference (TREC-3)</source>
          .
          <source>National Institute of Standards and Technology Special Publication</source>
          <volume>500</volume>
          -215.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Inkpen</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alzghool</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>and Islam A.</surname>
          </string-name>
          :
          <article-title>Using various indexing schemes and multiple translations in the CL-SR task at CLEF 2005</article-title>
          .
          <source>In Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evaluation Forum</source>
          ,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2005</year>
          , Vienna, Austria,
          <fpage>21</fpage>
          -
          <lpage>23</lpage>
          September, (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Combining multiple evidence from different properties of weighting schemes</article-title>
          .
          <source>Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , pp.
          <fpage>180</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Oard</surname>
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soergel</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doermann</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            <given-names>G.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramabhadran</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franz</surname>
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gustman</surname>
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Building an Information Retrieval Test Collection for Spontaneous Conversational Speech</article-title>
          ,
          <source>in Proceedings of SIGIR</source>
          , (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Oard</surname>
            <given-names>D.W.</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Jones</surname>
            <given-names>G. J. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pecina</surname>
            <given-names>P.</given-names>
          </string-name>
          , et al:
          <article-title>Overview of the CLEF 2007 cross-language speech retrieval track</article-title>
          .
          <source>In Working Notes of the CLEF- 2007 Evaluation</source>
          , Budapest, Hungary, (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ounis</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amati</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plachouras</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Macdonald</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Johnson</surname>
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Terrier Information Retrieval Platform</article-title>
          .
          <source>In 27th European Conference on Information Retrieval (ECIR 05)</source>
          , (
          <year>2005</year>
          ). http://ir.dcs.gla.ac.uk/wiki/Terrier
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Salton</surname>
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Buckley</surname>
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Term-weighting approaches in automatic retrieval</article-title>
          .
          <source>Information Processing and Management</source>
          , Vol.
          <volume>24</volume>
          , No.
          <volume>5</volume>
          , (
          <year>1988</year>
          )
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>