<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cross-language Retrieval at Twente and TNO.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dennis Reidsma</string-name>
          <email>reidsma@cs.utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djoerd Hiemstra</string-name>
          <email>hiemstra@cs.utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franciska de Jong</string-name>
          <email>fdejong@cs.utwente.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wessel Kraaij</string-name>
          <email>kraaij@tpd.tno.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>P.</institution>
          <addr-line>O. Box 217, 7500 AE Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TNO TPD</institution>
          ,
          <addr-line>P.O. Box 155, 2600 AD Delft</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Twente, Dept. of Computer Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the official runs of the Twenty-One group for CLEF-2002. The Twenty-One group participated in the Dutch and Finnish monolingual and the Dutch bilingual tasks. This paper also reports on an experiment that was carried out during the assessment work. The experiment was designed to examine possible influences on the assessments caused by the use of highlighting in the assessment program.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This paper describes the CLEF participation of the Twenty-One group. 1</p>
      <p>Section 2 provides the context in which research on multilingual information retrieval is carried
out at TNO TPD and the University of Twente. Section 3 discusses the Dutch and Finnish runs
that the Twenty-One group submitted to CLEF 2002. First the retrieval model is described
(section 3.1), after which our submissions to CLEF 2002 are presented. Section 4 describes and
analyses the results of an experiment that has been carried out on some aspects of the assessment
protocol and discusses its results.
to their translation equivalents. Ambiguity resolution and other problems inherent to CLIR-tasks
are circumvented in this concept search like approach. However, there is always the additional
user requirement to be able to search for terms that are not in the controlled list. Therefore, even
in ontology driven projects such as MUMIS, the type of CLIR functionality that is central to the
current CLEF-campaign remains relevant in the multimedia domain.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Retrieval experiments on the Dutch and Finnish document set</title>
      <p>The Twenty-One group participated in the Dutch and Finnish monolingual task and the Duch
bilingual task. In this section we present the retrieval model (section 3.1) and discuss the scores
for the different tasks.
3.1</p>
      <sec id="sec-2-1">
        <title>The retrieval model</title>
        <p>
          Runs were carried out with an information retrieval system based on a simple unigram language
model. The basic idea is that documents can be represented by simple statistical language models.
Now, if a query is more probable given a language model based on document d1, than given e.g. a
language model based on document d2, then we hypothesise that the document d1 is more likely
to be relevant to the query than document d2. Thus the probability of generating a certain query
given a document-based language model can serve as a score to rank the documents.
n
P (T1, T2, · · · , Tn|D)P (D) = P (D) Y(1 − λ)P (Ti) + λP (Ti|D)
i=1
(1)
Formula 1 shows the basic idea of this approach to information retrieval, where the document-based
language model P (Ti|D) is interpolated with a background language model P (Ti) to compensate
for sparseness. In the formula, Ti is a random variable for the query term on position i in the
query (1 ≤ i ≤ n, where n is the query length), which sample space is the set of all terms in the
collection. The probability measure P (Ti) defines the probability of drawing a term at random
from the collection, P (Ti|Dk) defines the probability of drawing a term at random from the
document; and λ is the smoothing parameter, which is set to λ = 0.15. The marginal probability
of relevance P (D) is assumed to be uniformly distributed over the documents in which case it may
be ignored in the above formula. For a description of the embedding of statistical word-by-word
translation into our retrieval model, we refer to [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The Dutch runs</title>
        <p>For Dutch three separate runs were submitted. First there was the manual run, in which we had
a special interest because of our role in the assesment of all the runs submitted for Dutch (cf.
section 4). The expected effect of submitting a run for which the queries were manually created
from the topics, was to increase the size and quality of the pool of documents to be assessed. The
engine applied was a slightly modified version of the NIST Z/Prise 2.0 system.</p>
        <p>
          The Dutch bilingual run is an automatic run done with the TNO retrieval system (also referred
to as the Twenty-One engine) as developed and used for previous CLEF participations [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ].
Furthermore we used the VLIS lexical database developed by Van Dale Lexicography and the
morphological analyzers developed by Xerox Research Centre Grenoble.
        </p>
        <p>For completeness we did a post-evaluation automatic monolingual Dutch run. Mean average
precision figures for the three runs are given in Table 1.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>The Finnish run</title>
        <p>
          Since we did not have a Finnish morphological analizer or stemmer, we decided to apply an
Ngram approach, which has been advocated as a language independent, knowledge-poor approach
run label
tnoutn1
tnoen1
tnofifi1
tnonn1
by McNamee and Mayfield [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. After applying a stoplist and lowercasing, documents and queries
were indexed by character 5-grams. Unlike the JHU approach, the 5-grams did not span word
boundaries. This extremely simple approach turned out to be very effective: for almost all topics
the score of this run was at least as high as the median score.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Assessment of the Dutch results</title>
      <p>The University of Twente was responsible for assessing the results for the Dutch newspaper
collections (articles from the newspapers ’NRC Handelsblad’ and ’Algemeen Dagblad’). Besides
assessing all topics in the standard way for the official ranking of the submitted runs, we also
repeated some assessments without allowing highlighting of search terms. This section discusses
the motivation for this additional experiment and reports on the findings.
4.1</p>
      <sec id="sec-3-1">
        <title>Introduction</title>
        <p>The program used to do the assessments is developed at NIST and offers the possibility to highlight
terms in the documents. Highlighting words and phrases for which a search engine has detected
a relation to the query terms might make it easier for the assessor to decide on the relevance of a
document. Usually the assessor will be told explicitly that the presence or absence of highlighted
terms in a document is not decisive in marking a document relevant. The assumption is that using
or not using highlighting will not influence the assessment results, or more specifically the ranking
of the search engines that follows from those results.</p>
        <p>We think however that this assumption can be questioned. The following subsection explains
that highlighting can affect the assessments and that therefore the use of highlighting may influence
the ranking of search engines. A simple experiment will be described that we applied to detect
such differences.</p>
        <p>If the assessment process would indeed be seriously influenced by the use of highlighting, the
implications would be large. Not only the assessment protocol would have to change, but the
validity of the assessments of previous years should also have to be reconsidered.
4.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Possible influences of highlighting on assessment results</title>
        <p>We wanted to investigate two different aspects of the assessment results which might be affected
by the use of highlighting. The first is the amount of documents that are marked as relevant, the
second is the score of the participating search engines. We did not expect to find hard statistical
evidence for presence or absence of either one of the influences, given the size of test data, but
rather expected some trend to show up, which would warrant further investigation.
The amount of relevant documents Using highlighting might result in more (or less)
documents being marked as relevant. Although the assessors are explicitly told not to let the
highlighting affect their judgement it is still possible that that happens unintentionally. For example,
assessors might read the documents where terms are highlighted less thouroughly, missing in those
documents the relevant parts which do not contain highlighted terms. Or the assessors might just
be biased in favor of documents containing highlighted terms.</p>
        <p>The scores of search engines If the assessors are indeed biased towards documents containing
highlighted terms this might influence the scores of the search engines. After all, many search
engines rely on detecting the presence of query words for marking them as relevant. So in that case,
those engines would perform better with the biased assessment than with assessments produced
without using highlighting.
4.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>The experiments</title>
        <p>The experiment was simple: 18 topics were each assessed at least twice, once with and once without
highlighting. These assessments were assigned randomly over 10 people, in such a way that every
assessor did some assessments with and without highlighting and no-one assessed one topic twice.
The assessors were absolutely not allowed to talk to each other about these assessments until all
assessments were finished.
4.4</p>
      </sec>
      <sec id="sec-3-4">
        <title>The results</title>
        <p>The results of this experiment were not conclusive. For half of the topics, the assessments with
highlighting resulted in more relevant documents than the assessments without highlighting. For
the rest of the topics it was the other way around. Viewed from the perspective of the assessors,
using highlighting did also not result in significantly more or less relevant documents relative to
the other assessors working on that topic.
4.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Conclusion</title>
        <p>There was no trend discernible that confirmed our expectations. However, we could only test the
first aspect described above; we did not have the necessary data to test the effect of the highlighting
on the scores of the search engines. This second aspect however is where we expected the most
interesting results. We recommend therefore testing that as well. If the amount of data is too
small to get reliable results, more data should be collected. If the results show a significant change
in the scores of the search engines when highlighting is turned off, the assessment protocol should
be reconsidered. It is possible then that the benefits of highlighting do not outweigh the adverse
effects on the quality of the assessments, in which case highlighting should not be used anymore.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pohlmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Westerveld</surname>
          </string-name>
          .
          <article-title>Translation resources, merging strategies and relevance feedback for cross-language information retrieval</article-title>
          .
          <source>In Cross-language Information Retrieval and Evaluation, Lecture Notes in Computer Science (LNCS-2069)</source>
          , Springer-Verlag, pages
          <fpage>102</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Kraaij</surname>
          </string-name>
          , TNO at CLEF-2001:
          <article-title>Comparing Translation Resources</article-title>
          .
          <source>In Working Notes of CLEF 2001 Workshop</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>McNamee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayfield</surname>
          </string-name>
          .
          <article-title>A Language-Independent Approach to European Text Retrieval In Cross-language Information Retrieval</article-title>
          and Evaluation, Lecture Notes in Computer Science (LNCS-
          <year>2069</year>
          ), Springer-Verlag, pages
          <fpage>102</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5] F. de Jong, J.-L. Gauvain,
          <string-name>
            <surname>Dj</surname>
            . Hiemstra,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Netter</surname>
          </string-name>
          .
          <article-title>Language-Based Multimedia Information Retrieval</article-title>
          .
          <source>In Content-Based Multimedia Information Access, RIAO 2000 Conference Proceedings</source>
          ,
          <year>2000</year>
          , ISBN 2-905450-07-X,
          <string-name>
            <surname>C.I.D.-C.A.S.I.S.</surname>
          </string-name>
          , Paris,
          <fpage>713</fpage>
          -
          <lpage>722</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>