Introduction 1 2

University of Indonesia Participation at IMAGE-CLEF 2005

0 Mirna Adriani and Framadhan Faculty of Computer Science University of Indonesia Depok 16424 , Indonesia

We present a report on our participation in the English-Indonesian image adhoc task of the 2005 Cross-Language Evaluation Forum (CLEF). We chose to translate the Indonesian query set into English using a commercial machine translation tool called Transtool. We show that some improvement in retrieval effectiveness can be obtained using a query expansion technique. We used an approach that combines the retrieval results of the query on text and on image.

image retrieval cross-language information retrieval machine translation query expansion

Introduction 1 2

This year we, the University of Indonesia IR-group, participate in the bilingual ad-hoc Image - Cross Language Evaluation Forum (CLEF) 2005 task, i.e., the Indonesian-English CLIR. We used commercial machine translation software called Transtool1 to translate an Indonesian query set into English. We learned from our previous work [1, 2] that freely-available dictionaries on the Internet failed to provide correct translations for many query terms, as their vocabulary was very limited. We hoped that we could improve the result using machine translation. 2.1

2.2. Combining the Scores of the Text and the Image

The short caption that attached to each image in the collections was indexed using Lucene2, an open source indexing and retrieval engine, and the image collection was indexed using GIFT3. We combined the scores of the text and the image retrieval in order to get a better result. The text was given more weight because the image retrieval effectiveness that we obtained from using the GIFT was poor. We used the two examples given by CLEF and ran them as query by example through GIFT to search through the collection. We combined the color histogram, texture histogram, the color block, and the texture block in order to get the images that are most similar to the two examples. The text score was given a weight of 0.8 and the image score was given 0.2. These weights were chosen after comparing a number of different weight configurations in our initial experiments. 3

Experiment

The image collection contains 28,133 images from St. Andrews image collection that have short captions in English. We participated in the bilingual task using Indonesian query topics. We opted to use the query title and the narrative for all of the available 28 topics. The query translation process was performed fully automatic using the Transtool machine translation software.

We then applied the pseudo relevance-feedback query-expansion technique to the translated queries. We used the top 20 documents from the Glasgow Herald collection to extract the expansion terms.

In these experiments, we used the Lucene information retrieval system to index and retrieve image captions (text). 4

Results

Our work was focused on the bilingual task using Indonesian queries to retrieve images from the image collections. The machine translation tool failed to translate 3 words in the titles and 8 words in the narratives. In particular, the machine translation failed to translate Indonesian names of places or locations such as Skotlandia (Scotland), Swis (Swiss), and Irlandia (Ireland) into English. The average number of words in the queries was largely the same as the resulting English version. 2 See http://lucene.apache.org/. 3 See http://savannah.gnu.org/projects/gift/. image score into account, in additional to text, the results showed some improvement. For the title-based retrieval, the image score increased the average retrieval precision by 7.9%; for the narrative-based retrieval, the image score increased the average retrieval precision by 11.22%. However, the query expansion technique did not improve the retrieval performance. It decreased the retrieval performance of the title-only retrieval by 30.01% and narrative-only retrieval by 10.94%.

Task : Bilingual

Title + Expansion Title + Image Title + Narrative + Expansion Title + Narrative + Image Narrative Narrative + Expansion Narrative + Image

P/R 0.2122

The retrieval effectiveness of combining title and narrative was 1.88% worse than that of the title only retrieval, but was 14.45% better than the narrative only retrieval. The query expansion also decreased the retrieval performance by 7.25% compared to the combined title and narrative queries. Adding the weight of the image to the combined title and narrative scores helped increase the retrieval performance by 7.34%.

Summary 4 5

Our results demonstrate that combining the image with the text in the image collections result in better retrieval performance compared to using only the text [4]. However query expansions using general newspaper collections hurt the retrieval performance of the queries. We hope to find a better approach to improve the retrieval effectiveness of the combined text and image-based retrieval.

Adriani , M. and

C.J. van Rijsbergen. Term

Similarity Based Query Expansion for Cross Language Information Retrieval . In Proceedings of Research and Advanced Technology for Digital Libraries, Third European Conference (ECDL'99) , p. 311 - 322 . Springer Verlag: Paris, September 1999 .

Adriani , M. Ambiguity

Problem in Multilingual Information Retrieval . In CLEF 2000 Working Note Workshop . Portugal, September 2000 .

Baeza-Yates , Richardo, and Berthier Ribeiro-Neto. Modern Information Retrieval , New York: AddisonWesley, 1999 .

Clough , Paul, Mark Sanderson, and Henning

Muller . The CLEF Cross Language Image Retrieval Track (ImageCLEF) 2004 . In CLEF 2004 Working Note Workshop. UK, September 2004 .

Salton , Gerard, and McGill , Michael J . Introduction to Modern Information Retrieval, New York: McGrawHill, 1983 .