<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Some Experiments with the Dutch Collection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arjen P. de Vries CWI Amsterdam The Netherlands</string-name>
          <email>arjen@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Anne Diekema CNLP, Syracuse University Syracuse NY</institution>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We performed some basic monolingual Dutch and bilingual English-Dutch experiments. The retrieval approach is very basic, without stemming or decompounding, using only a simple language model to rank the documents. In the bilingual task, the English queries are analyzed by a system for question answering. The resulting queries are translated by dictionary lookup and ranked by the same basic retrieval system used in the monolingual task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Inspired by shared observations at the first CLEF workshop1, the authors decided to initiate work
on retrieval experiments that help understand the effect of the quality of the translation process
on retrieval results.</p>
      <p>We are primarily interested in the problem of multi-lingual retrieval from Dutch collections,
and as we believe the quality of the resources for translation a significant factor in the retrieval
results, we decided to focus on the bilingual English to Dutch task. Limiting ourselves to this
task, we could deploy two high quality resources:
1‘We want better resources.’
component in an automatic translation system. As a workaround, we developed a screen-scraping
tool based on the Win32 modules for the Perl scripting language: we discovered that the Van
Dale application supports requests to lookup query words using DDE (a Windows protocol for
data exchange). The results, displayed in the results pane, are copied through the Clipboard into
our script by emulating the right sequence of keystrokes. The data captured on the Clipboard is
then parsed and converted into a query-specific dictionary. Because some terms generate a large
number of alternative translations for many different senses, we set some ad-hoc thresholds: a
maximum amount of 10 translations per term, from a maximum of 5 different senses, but taking
never more than 3 translations per sense.</p>
      <p>The results of these two steps (as well as the documents in the collection) are converted to
lowercase and stripped from ‘strange’ characters, and stopwords are removed. The retrieval backend is
a database implementation of the simple – but proven effective – language models developed by
Hiemstra [Hie01] (more information about the retrieval backend is given in [HdV00]). We intended
to perform our experiments with an improved implementation that processes both phrases and
disjunctive queries, but we did not finish our implementation work in time, so we have used the
simple term-based model.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Analysis</title>
      <p>Run
AAmoNLtd
AAmoNLt
AAbiENNLtd
AAbiENNLt</p>
      <p>Mean average precision
0.399
0.348
0.162
0.133</p>
      <p>The results of our experiments are summarized in Table 1. We submitted four runs, their
names encoding the task – monolingual (‘mo’) or bilingual (‘bi’) – and the portion of the topics
that has been processed: title only (‘t’) or title and description (‘td’).</p>
      <p>The difference in mean average precision between title-only and title-and-description topics is
surprisingly small in both tasks, especially since the title-only topics are quite short (2.5 word
on average). A very large difference is found in topic C110, which is however explained easily
since the description gives the name ’Kazem Radjavi’ and the title does not. Apparently, only
a small number of query terms really help retrieving relevant documents, and those query terms
do usually occur in both title and description. Further analysis is warranted to explain the small
drop in performance.</p>
      <p>The significantly decreased mean average precision of the bilingual runs when compared to
the monolingual runs demonstrates that the query translation component of our system requires
more work. A main cause of our disappointing results is the approach of using the Van Dale
dictionary through screen-scraping. First, communication via Clipboard cut-and-paste sometimes
malfunctioned: the data would not appear on the Clipboard, probably due to timing problems.
This makes it particularly difficult to check whether no translation occurred in the dictionary, or,
the answer is not there due to communication problems. For example, a term like ‘space probe’ is
found in the Van Dale dictionary, but the translation was unfortunately ‘dropped’ by our script
(other examples are ‘telephone’, ‘administration’ and ‘fishing’). Second, the Van Dale application
performs a fuzzy match if the query term is not found, but checking whether that has happened
would require an additional cut-and-paste of a different text pane. Finally, the copied data is
not trivially interpreted by a machine, as it does contain instructions aimed at people, such as
‘compare to’ or ‘see also’.</p>
      <p>A deeper problem with our current approach lies in the interaction between different process
steps. As a simple example, the front-end adds ’Koweit’ as an alternative for ’Kuwait’, but this
alternative does not exist in the dictionary so is ignored further on in the process. In this particular
case, it does not hurt effectiveness during the retrieval step, as ’Koweit’ will not be found in the
Dutch collection. A more interesting example of this problem is exposed when the additional
intelligence in the front-end actually reduces effectiveness. A particularly good example of this
case is provided by topic 91, AI in Latin America. The L2L module makes two wrong assumptions
in this case: ‘AI’ is not ’Artificial Intelligence’, and ‘America’ does not always imply ’United States’.
The final result of the process is the following complex query: (‘Artificial Intelligence’ ∨ (‘United
States’ ∨ ‘United States of America’ ∨ ‘US’ ∨ ‘USA’). Of course, the original untranslated query
terms are added, but given that the retrieval model emphasizes frequent query words, most of
the results discuss indeed artificial intelligence in the US, and not Amnesty International in Latin
America as desired.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Next Steps</title>
      <p>Summarizing our current experimental results, we must conclude we have not made much progress
since the first CLEF workshop. Still, we find ourselves in need of basic resources such as the
dictionary; while the chosen ‘Van Dale’ seems an excellent tool for interactive usage during a word
processing session on a Windows desktop, its application in an automatic retrieval system seems
difficult. Although some of the ambiguities in the translation instructions can be interpreted
automatically, the current screen-scraping solution is not sufficiently reliable to base further retrieval
experiments upon.</p>
      <p>A more interesting challenge is to find a balance between the sophisticated (but sometimes
mistaken) analysis of the query in the L2L module, and the brute-force term counting of the
statistical retrieval model. While we are convinced that it should (some day) be possible to
improve retrieval results with an intelligent analysis of the query, it is not yet clear how to detect
– without human intervention – that a rule like ‘America → US’ does not apply for ‘Latin America’.</p>
      <p>The most urgent work to be done is however to finalize the implementation of an
adaptation of our retrieval model that takes into account phrases and disjunctions. Even though most
components are in place, more engineering is needed before these experiments can be performed.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [DLC+01]
          <string-name>
            <given-names>A.</given-names>
            <surname>Diekema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>McCracken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yilmazel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.D.</given-names>
            <surname>Liddy</surname>
          </string-name>
          . Question Answering:
          <article-title>CNLP at the TREC-9 Question Answering Track</article-title>
          . In E.M. Voorhees and
          <string-name>
            <surname>D.K</surname>
          </string-name>
          . Harman, editors,
          <source>Proceedings of the Nineth Text Retrieval Conference TREC-9</source>
          , number 500-249 in NIST Special publications, pages
          <fpage>501</fpage>
          -
          <lpage>510</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [HdV00] [Hie01] [vD97]
          <string-name>
            <given-names>Djoerd</given-names>
            <surname>Hiemstra</surname>
          </string-name>
          and Arjen de Vries.
          <article-title>Relating the new language models of information retrieval to the traditional retrieval models</article-title>
          .
          <source>Technical Report TR-CTIT-00-09</source>
          , Centre for Telematics and Information Technology, May
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Groot woordenboek</surname>
          </string-name>
          Nederlands-Engels/Engels-Nederlands. CD-ROM,
          <year>1997</year>
          . Versie 1.
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>