<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>University of Indonesia Participation at IMAGE-CLEF 2005</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Mirna Adriani and Framadhan Faculty of Computer Science University of Indonesia Depok 16424</institution>
          ,
          <country country="ID">Indonesia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a report on our participation in the English-Indonesian image adhoc task of the 2005 Cross-Language Evaluation Forum (CLEF). We chose to translate the Indonesian query set into English using a commercial machine translation tool called Transtool. We show that some improvement in retrieval effectiveness can be obtained using a query expansion technique. We used an approach that combines the retrieval results of the query on text and on image.</p>
      </abstract>
      <kwd-group>
        <kwd>image retrieval</kwd>
        <kwd>cross-language information retrieval</kwd>
        <kwd>machine translation</kwd>
        <kwd>query expansion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction 1 2</title>
      <p>This year we, the University of Indonesia IR-group, participate in the bilingual ad-hoc Image - Cross Language
Evaluation Forum (CLEF) 2005 task, i.e., the Indonesian-English CLIR. We used commercial machine
translation software called Transtool1 to translate an Indonesian query set into English. We learned from our
previous work [1, 2] that freely-available dictionaries on the Internet failed to provide correct translations for
many query terms, as their vocabulary was very limited. We hoped that we could improve the result using
machine translation.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>2.2. Combining the Scores of the Text and the Image</title>
      <p>The short caption that attached to each image in the collections was indexed using Lucene2, an open source
indexing and retrieval engine, and the image collection was indexed using GIFT3. We combined the scores of the
text and the image retrieval in order to get a better result. The text was given more weight because the image
retrieval effectiveness that we obtained from using the GIFT was poor. We used the two examples given by
CLEF and ran them as query by example through GIFT to search through the collection. We combined the color
histogram, texture histogram, the color block, and the texture block in order to get the images that are most
similar to the two examples. The text score was given a weight of 0.8 and the image score was given 0.2. These
weights were chosen after comparing a number of different weight configurations in our initial experiments.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiment</title>
      <p>The image collection contains 28,133 images from St. Andrews image collection that have short captions in
English. We participated in the bilingual task using Indonesian query topics. We opted to use the query title and
the narrative for all of the available 28 topics. The query translation process was performed fully automatic using
the Transtool machine translation software.</p>
      <p>We then applied the pseudo relevance-feedback query-expansion technique to the translated queries. We used
the top 20 documents from the Glasgow Herald collection to extract the expansion terms.</p>
      <p>In these experiments, we used the Lucene information retrieval system to index and retrieve image captions
(text).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>Our work was focused on the bilingual task using Indonesian queries to retrieve images from the image
collections. The machine translation tool failed to translate 3 words in the titles and 8 words in the narratives. In
particular, the machine translation failed to translate Indonesian names of places or locations such as Skotlandia
(Scotland), Swis (Swiss), and Irlandia (Ireland) into English. The average number of words in the queries was
largely the same as the resulting English version.
2 See http://lucene.apache.org/.
3 See http://savannah.gnu.org/projects/gift/.
image score into account, in additional to text, the results showed some improvement. For the title-based
retrieval, the image score increased the average retrieval precision by 7.9%; for the narrative-based retrieval, the
image score increased the average retrieval precision by 11.22%. However, the query expansion technique did
not improve the retrieval performance. It decreased the retrieval performance of the title-only retrieval by
30.01% and narrative-only retrieval by 10.94%.</p>
      <p>Task : Bilingual</p>
      <sec id="sec-4-1">
        <title>Title + Expansion</title>
      </sec>
      <sec id="sec-4-2">
        <title>Title + Image</title>
      </sec>
      <sec id="sec-4-3">
        <title>Title + Narrative + Expansion</title>
      </sec>
      <sec id="sec-4-4">
        <title>Title + Narrative + Image Narrative</title>
      </sec>
      <sec id="sec-4-5">
        <title>Narrative + Expansion</title>
      </sec>
      <sec id="sec-4-6">
        <title>Narrative + Image</title>
        <p>P/R
0.2122</p>
        <p>The retrieval effectiveness of combining title and narrative was 1.88% worse than that of the title only retrieval,
but was 14.45% better than the narrative only retrieval. The query expansion also decreased the retrieval
performance by 7.25% compared to the combined title and narrative queries. Adding the weight of the image to
the combined title and narrative scores helped increase the retrieval performance by 7.34%.</p>
        <p>Summary
4
5</p>
        <p>Our results demonstrate that combining the image with the text in the image collections result in better retrieval
performance compared to using only the text [4]. However query expansions using general newspaper
collections hurt the retrieval performance of the queries. We hope to find a better approach to improve the
retrieval effectiveness of the combined text and image-based retrieval.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Adriani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>C.J. van Rijsbergen. Term</given-names>
            <surname>Similarity</surname>
          </string-name>
          <article-title>Based Query Expansion for Cross Language Information Retrieval</article-title>
          .
          <source>In Proceedings of Research and Advanced Technology for Digital Libraries, Third European Conference (ECDL'99)</source>
          , p.
          <fpage>311</fpage>
          -
          <lpage>322</lpage>
          . Springer Verlag: Paris,
          <year>September 1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Adriani</surname>
            ,
            <given-names>M. Ambiguity</given-names>
          </string-name>
          <article-title>Problem in Multilingual Information Retrieval</article-title>
          .
          <source>In CLEF 2000 Working Note Workshop</source>
          . Portugal,
          <year>September 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Baeza-Yates</surname>
          </string-name>
          , Richardo, and
          <string-name>
            <surname>Berthier</surname>
          </string-name>
          Ribeiro-Neto.
          <source>Modern Information Retrieval</source>
          , New York: AddisonWesley,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Clough</surname>
            , Paul, Mark Sanderson, and
            <given-names>Henning</given-names>
          </string-name>
          <string-name>
            <surname>Muller</surname>
          </string-name>
          .
          <article-title>The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004</article-title>
          . In CLEF 2004 Working Note Workshop. UK,
          <year>September 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Salton</surname>
          </string-name>
          , Gerard, and
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>Michael J</given-names>
          </string-name>
          . Introduction to Modern Information Retrieval, New York: McGrawHill,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>