<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recovering translation errors in cross-language image retrieval using word association models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Masashi Inoue</string-name>
          <email>m-inoue@nii.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Informatics</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Connecting short queries and short titles of relevant images is difficult. Different lexical expressions may be used in queries and captions that refer to the same concept. In the ImageCLEF2005 ad hoc task, we investigated the use of learned word association models that represent how pairs of words are related. We compared a precision-oriented simple word-matching retrieval model and a recall-oriented retrieval model with word association models, and we also investigated combinations of models. Experimental results on English and German topics are rather discouraging, as the use of word association models degraded performance. On the other hand, word association models help in retrieval for Japanese topics. Considering the relatively low quality of Japaneseto-English machine translation, this result may indicate that word association could play some role in recovering translation errors at the retrieval stage.</p>
      </abstract>
      <kwd-group>
        <kwd>Image retrieval</kwd>
        <kwd>word association</kwd>
        <kwd>sparse data</kwd>
        <kwd>translation error</kwd>
        <kwd>model combination</kwd>
        <kwd>result merge</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Retrieval of too many or too few documents for a query causes problems for the users of information
retrieval (IR) systems. If given too many documents, they may experience difficulties in finding
relevant documents among the results. On the other hand, if given too few documents, there
will be little chance of finding relevant information. Of the above two problems, we consider the
latter problem: insufficient retrieved results. More precisely, this problem can be divided into two
sub-problems: 1) there are not enough relevant documents stored in the database, or 2) relevant
documents are not retrieved by the system. We concentrate on the second sub-problem.</p>
      <p>Typically in text retrieval, a shortage of retrieved documents is often caused by the problem
of term-mismatch: the words in a query do not appear in most documents even though they are
relevant to the query. This is essentially the same in ad hoc image retrieval when captions are
used as the target of query matching. The difference is that sometimes the number of words in
image captions is quite small and term-mismatches are likely to occur more often.
word mismatch
vocabulary</p>
      <sec id="sec-1-1">
        <title>Query</title>
        <p>&lt;ground&gt; (in different language)</p>
        <sec id="sec-1-1-1">
          <title>Machine Translation</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Word Association Model</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Matching</title>
      </sec>
      <sec id="sec-1-3">
        <title>Translated Query</title>
        <p>&lt;terrestrial&gt;</p>
      </sec>
      <sec id="sec-1-4">
        <title>Softly Expanded Query</title>
        <p>&lt;soil, ground, terrestrial&gt;</p>
      </sec>
      <sec id="sec-1-5">
        <title>Image Caption</title>
        <p>&lt;ground&gt;</p>
        <p>One way to mitigate such mismatches is to use an enlarged query word set instead of the small
query word set supplied by users. A typical technique is query expansion where some alternative
query words are added to the original query from the document set based on relevance or
pseudorelevance judgements.</p>
        <p>
          In ImageCLEF2005, we studied the effects of word association models by employing a kind
of probabilistic word-by-word query translation model structure [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], although in our models, the
actual translation took place by the MT system outside of the retrieval model. That is, the
translation in the model is, in effect, the monolingual word expansion [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. We tested our approach in the
setting where both queries and annotations were short, which are frequently observed
characteristics of text-based image retrieval. Concerning the differences between langages, we only considered
the influence of machine translation (MT). Monolingual English-to-English, cross-lingual
Germanto-English, and cross-lingual Japnanese-to-English image retrievals were compared. One finding
from our experimental runs was that when the simple word matching strategy failed to retrieve
relevant images because of erroneous translations, the use of word association models could
improve the word matching. The conceptual process of translation error recovery by word association
is depicted in Figure 1. In our runs, a recovery effect was observed only in Japanese-to-English
translation, an example of translation between disparate languages.
        </p>
        <p>In the following, we first introduce the ImageCLEF2005 image collection and the pre-processing
applied to it. Second, we describe the run conditions and retrieval models used. Third, we show
the retrieval results on the submitted runs. Finally, we conclude the paper with some discussion.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data Preparation</title>
      <p>
        Test Collection
The test collection of ImageCLEF2005 consists of 28133 images and their captions in English
from the St Andrews Library photographic collection [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Each caption has nine fields assigned
by experts. These are ‘Record ID’, ‘Short title’, ‘Long title’, ‘Location’, ‘Description’, ‘Date’,
‘Photographer’, ‘Categories’, and ‘Notes’. Such well-annotated images can be found in places such
as digital museums and commercial photo collections but are rare in other cases. That is because
casual annotators do not have enough knowledge to annotate images systematically, nor they do
have any desire to spend time on annotations. For this reason, we are motivated to retrieve less
carefully annotated images. Of the fields in the test collection, ‘short titles’ are considered to be
the simplest form of annotation. Therefore, we used only short titles for indexing. The mean
length of the short titles was 3.43 words . The distribution of lengths had a heavy tail on the short
side. The size of the vocabulary was 9883 words for the documents, and 9945 for both documents
and queries . The vocabulary contained three irregular words: ‘null’, ‘untitled’, and ‘φ’, where φ
is introduced to represent empty titles.
      </p>
      <p>Topics of retrieval were described in three fields: short descriptions (titles), long descriptions
(narratives), and example images. In our experiment, we used only short descriptions (titles),
which can be regarded as typical queries.
2.2</p>
      <p>Run Conditions
The main characteristics of our runs are summarized in Table 1. The most notable point is that
we used only the short title field. We were interested more in the exploitation of short text than
in utilization of structured and multi-faceted text. Although we were also interested in the use of
visual images, we were unable to advance to that level.</p>
      <p>Another point we should mention is the use of ‘Feedback/Expansion’. We used only expansion
and not feedback. That is, we employed neither manual relevance feedback nor pseudo-relevance
feedback. The models for the soft word expansion were built prior to querying and no candidate
word selection process was used at the querying stage. The retrieval model we used is explained
in 3.1.</p>
      <p>
        The last factor is the query language. We examined English, German, and Japanese. We
considered English topics as the baseline, German topics as the relatively ‘easy’ task, and Japanese
topics as the relatively ‘hard’ task. Here, by ‘easy’ we mean that the current state-of-the art
accuracy of machine translation is high and retrieval can be conducted in nearly the same fashion
as with the source language. Similarly, by ‘hard’, we mean that queries differ substantially from
the original ones after going through the machine translation process. According to the results of
ImageCLEF2004 that consists of the same image dataset as ImageCLEF2005 but different topics,
German topics yielded the highest average MAP score after English, and Japanese topics yielded
the lowest average MAP scores for the top five systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Filed selection
&lt;short title&gt;
Query
Filed selection
&lt;title&gt;</p>
      <p>Tag removal</p>
      <p>Removal of punctuation characters
and extra white spaces</p>
      <p>Lowercasing letters</p>
      <p>Indexing
Machine Translation</p>
      <p>Removal of punctuation characters
and extra white spaces</p>
      <p>Lowercasing letters</p>
      <p>
        Indexing
The pre-processing we conducted to prepare data is summarized in Figure 2. As mentioned in the
previous section, we used only the title fields of topics and short title fields of captions. Therefore,
the initial step was the extraction of those fields from the collection. For topics, titles were
surrounded by &lt;title&gt; tags; these tags were removed and the bodies of the titles were translated.
Although translation is part of the retrieval process, we explain the procedure of translation here
because we carried out translations within the process of data preparation. Our approach to
cross-language retrieval is query translation. According to previous experiments on ImageCLEF
ad hoc data, query translation generally outperforms document translation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We thought that
the combination of query translation and document translation might be promising. However, as
the starting point, we only consider query translation here. German and Japanese topics were
translated into English, the document language, using the Babelfish web-based MT system1. The
translation was done manually: we entered queries in a web form and their respective translations
were returned, and punctuation and extra white spaces were removed. The results of translation
are shown in Appendix A. Punctuation in the short titles was also removed. Finally, all upper
case letters were converted to lower case, and both queries (titles) and documents (short titles)
were indexed together.
2.4
      </p>
      <p>Qualitative Analysis of Translation Errors
The results of machine translation usually contain errors that may affect the performance of IR
at a subsequent stage, but the relationship is not straightforward. For example, when a word is
translated into ‘photographs’ when it should be translated to ‘pictures’, this difference has little
effect in understanding sentences that contain the word. Therefore, it may not be considered an
error. However, for IR, and image retrieval in particular where only short text descriptions are
available, such a difference may change the results of retrieval drastically. For instance, when all
relevant images are annotated as ‘pictures’, a query translated as ‘photographs’ cannot retrieve
them. On the other hand, when all relevant images are annotated as ‘photographs’, the
mistranslation benefits the retrieval process. Here, we analyse the results of machine translation of queries
from the point of view of their effect on IR.</p>
      <p>First, we examine the overall quality of the translations. Translation from German to English
was performed well. Among 28 topics (titles), four topics were translated exactly as in the original
English – topic numbers 3, 5, 6, and 18 in A.2. This result confirms the relatively high accuracy
of German–English MT. Notable errors in German-to-English translation were related to
prepositions. For example, ‘at’ in topic 1 should be ‘on’, ‘of’ in topic 12 should be ‘from’, and ‘on’ in topic
14 should be ‘at’. Other typical errors were inappropriate assignment of imprecise synonyms. For
example, in topic 1 ‘ground’ is replaced by ‘soil’, and in topics 10 and 28 ‘picture’ is replaced by
‘photographs’. Despite these errors, in most translation results from German, the basic meanings
of topics were similar to the original English. More problematic was that three words were not
translated into English: ‘Fischer’ in topic 7, ‘puttet’ in topic 15, and ‘Portraitaunahmen’ in topic
26. For simplicity, we treated them as if they were English words. Additionally, the MT system
did not translate ‘kutsche’ into any word.</p>
      <p>For Japanese-to-English translation, the quality of translation was apparently worse (see A.3).
Some of the Japanese words could not be translated at all. Untranslated words were ‘aiona (Iona)’,
‘nabiku (waving)’, and ‘sentoandoryusu (St Andrews)’. The problem here is that the untranslated
words were often proper nouns, which might be useful for distinguishing relevant documents from
irrelevant documents. Ideally, such out-of-vocabulary words should be translated by using other
external sources, such as larger and more up-to-date dictionaries or by transliteration. In this
experiment, however, we simply eliminated such untranslated non-ASCII characters from the
translation results.</p>
      <p>In German-to-English translation, the above two proper names (Iona and St Andrews) could
be translated with no problem. This difference can be understood easily by the fact that, in
German topics, the words were spelled in the same way as in English. Therefore, no translation
was necessary. On the other hand, in Japanese topics, the translator had phonetically converted
them from the original English topics to katakana characters. Therefore, untranslated words could
not be used as is. Another factor to be considered is that phonetic transcription is not unique.
Therefore, even if there was an entry for the word in the dictionary, the back translation by MT
systems might not find a relevant entry because of the phonetic ambiguities.</p>
      <p>In addition to the above out-of-vocabulary word problems in the MT system, the
Japaneseto-English translations contained errors in prepositions similar to those in the German-to-English
translations. Errors that were peculiar to the Japanese-to-English translations were the excessive
use of definite articles and relative pronouns. We hypothesize that such translations were derived
from the design of the MT system, which was designed not for the translation of short phrases
such as titles, but for larger units of text such as paragraphs. Thus, the MT system tried to
produce natural sentences by adding definite articles and relative pronouns to fill the gaps
between grammatically disparate languages. Short query translations may require choosing either
a sophisticated MT system or simple word-by-word translation, depending on the difficulties of
translation.</p>
      <p>So far, we have discussed the quality of the topic translations assuming that both German and
Japanese topics are equivalent to the English topics in terms of their contents. However, it may
be noteworthy that they are the translations from the original English topics. Since relationships
between expressions in different languages are not one-to-one, a non-English topic used here was
one of many possible translations. Moreover, the expressions in translated queries might not be
typical as the queries in that language even if they were correct translations of typical English
queries. Therefore, the translation errors analysed above were possibly caused by both machine
translation at retrieval stage and by translation ambiguities at topic preparation stage. Although
this is not negligible, because it is too involved a subject to be treated here in detail and we can
expect the translations by experts were far better than machine translation, we do not consider
the influence by the translations when topics were created.
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>Retrieval Models
We introduce retrieval models based on the unigram language model and word association model.
The baseline model is the simple keyword matching document model denoted by diag. For the
query q = {q1, ..., qK }, the probability of q is
where dn indicates the nth document or image. For the word association model, we estimated the
following transitive probabilities from the jth word to the ith word in the vocabulary:</p>
      <p>P (wi|wj).</p>
      <p>When the above two models are combined, the following model represents the process of query
generation:</p>
      <p>K
Y P (qk|dn),
k=1
K V
Y X P (qk|wi)P (wi|dn).</p>
      <p>k=1 i=1
Here, we assume independence between query words: P (q) = QK
k=1 P (qk), although this is not
always true for the ImageCLEF2005 topics, where titles are sometimes sentential and word orders
have meaning.</p>
      <p>The word association models can be estimated in various ways, disregarding the statistical
justification. We tried three methods. In all three methods, we regarded the frequencies of
cooccurrence of two words as the measure of word association. If two words co-occurred, they were
assumed to be related. The first method counts self co-occurrences, where a word is regarded
as co-occurring with itself, as well as co-occurrences. Values for each term pair are estimated as
follows:</p>
      <p>P (wi|wj) =
P (wi|wi) =
#(wi, wj)</p>
      <p>#(wj)
#(wi, wi) + #(wi)
#(wi)
where i ̸= j and #(wj) &gt; 0,
where #(wi) &gt; 0.</p>
      <p>Here, #(wi, wj) represents the frequency of co-occurrence of wi and wj (appearance of the two
words in the same image caption), and #(wj) represents the frequency of occurrence of wj. This
procedure strengthens self-similarities in the model and is termed cooc. The second method counts
purely co-occurring pairs and is named coocp. Values for each term pair are estimated as follows:
This method is termed cooct. The baseline model that does not use word association models can
be interpreted as using a diagonal word association model with non-zero elements that are all one.
This is why we denoted it as diag.</p>
      <p>Note that these models were estimated prior to the arrival of queries and the computation at
query time focused on score calculation.</p>
      <p>The third method normalizes the frequencies of co-occurrences (wi and wj) by the frequencies of
the word wj:</p>
      <p>P (wi|wj) =
#(wi, wj) .</p>
      <p>#(wj)
P (wi|wj) =</p>
      <p>#(wi, wj)
#(wi)#(wj)
.
Our runs were divided into two groups according to the scoring function employed. In the first
group, documents were ranked according to the query–log likelihood of document models. As
we used unigram language models for each document, the scoring function for the nth document
given the query q is written as:
where K is the length of the query. When a word association model is used, the function becomes</p>
      <p>K
log L = X log P (qk|dn),</p>
      <p>k=1</p>
      <p>K V
log L = X log X P (qk|wi)P (wi|dn),</p>
      <p>k=1 i=1
where V is the vocabulary size. Runs based on these functions were marked with log_lik.</p>
      <p>In the second group of runs, documents were ranked according to the accumulated information
for all matched words. First, we transform the variable for the probability of query word qk, P (q),
to Fq = e(log P (q))−1 where P (q) is either P (q|dn) or PV
i=1 P (q|wi)P (wi|dn) and is considered only
when P (q) ̸= 0. Then, the new scoring function can be defined as:</p>
      <p>K
log L′ = X log
k=1
1
Fqk
.</p>
      <p>We regard log F1qk as the information on query word q. A document with a higher score is assumed
to have more information on the query. In general, when an expansion method is involved, the
number of terms matched between queries and documents increases. Consequently, the scores
of documents given by the first scoring measure log_lik are larger in models with expansion
than in those without. Thus, the first scoring measure is not suited for the comparison of output
scores between different models. The second measure was derived heuristically and is intended to
allow combining the outputs of different models. Runs based on this measure were marked with
vt_info.
3.3</p>
      <p>Model Output Combination
When the vt_info measure is used, the combination of different models at the output level
is simple because their scores are directly comparable. First, two sets of document scores and
corresponding document indices from two models are merged. Then, they are sorted in descending
order of scores. For each document, the higher score is retained. This assumes that lower scores
usually correspond to lack of knowledge about the documents and are thus less reliable. From the
eventual rank, the top M documents will be extracted as the final result.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <p>We submitted 16 runs (files), consisting of eight for English, four for German, and four for Japanese
topics. We used wam as the group name. The names of submission were formed as the concatenation
of group name, scoring function, word association model, and query language. Regarding
the word association model, dc represents the combination of diag and cooc using the method
described in 3.3. For English topics, the following runs were submitted:
wam_log_lik_diag_e
wam_log_lik_cooc_e
wam_log_lik_coocp_e
For German topics:
wam_log_lik_diag_g
wam_log_lik_cooc_g
wam_vt_info_diag_g
wam_vt_info_cooc_g
For Japanese topics:
wam_log_lik_diag_j
wam_log_lik_cooc_j
wam_vt_info_diag_j
wam_vt_info_cooc_j</p>
      <p>Each file contained 980 scores (the 35 top scores for each of 28 topics). We made a mistake
when creating the submission files and could not obtain any meaningful official results for these
submissions. The figures of mean average precision (MAP) scores in Table 2 are based on the
runs we intended to submit. They were calculated after we received the list of relevant images
for each topic (qrel file). Overall, our retrieval performances were insufficient. For comparison,
we included the MAP scores of the best runs from other participants, as shown at the bottom of
Table 2. They are CUHK-ad-eng-tv-kl-jm2 for English topics, R2D2vot2Ge for German topics,
and AlCimg05Exp3Jp for Japanese topics. For English runs, we also cited the MAP score of an
example run imirt0baset0enen with run conditions similar to ours (title query and short title
index).</p>
      <p>Having observed the difference between our runs and others, we may now turn to the analysis
of our own runs. In English, our best run was actually the diag model, which we had considered
as the simplest baseline. In contrast, all models with word association underperformed. There are
two possible explanations for this result. First, there was no need to relax the limitation of exact
term matching. Some relevant documents could be retrieved by word-by-word correspondence
and other relevant documents could never be reached by word-level expansion. Second, the word
association models were not learned adequately, so they could not help with connecting query
words and document words. To clarify which of these two reasons led to this result, we must
analyse the data set further. This relationship between the diag model and other cooc-type
models was the same for the German topics. When the vt_info scoring function was used, an
important observation is that the MAP scores for cooc and cooct were the same and those for
diag and dc were nearly the same. By analysing the performances for individual topics, we found
that cooc and cooct behaved in the same way.</p>
      <p>Further analysis is required to understand this phenomenon. For dc, by analysing the influences
of the two models, we observed that the diag model dominated the top scores. We had expected
this tendency, because an exact-matching scheme should have higher confidence in its outputs
when queries can find their counterparts. What was unexpected was that the dominance of the
diag model often ranged from the top rank to about the 1000th rank, and scores given by cooc
models appeared only in lower ranks. Even though the ranking was determined by the interlaced
ranks from both models, because we had submitted only the top 35 ranks, the resulting MAP
scores were determined almost solely by the diag model. This outcome was not desirable. For
topics 2 and 18, the cooc models worked better than the diag models. Nevertheless, as explained
above, the benefit of the cooc models was not taken into account in the final results of the dc
method. We must consider a better way of rank merging so as not to miss such opportunities.</p>
      <p>As we can see in Table 2, the trends of model discrepancy were similar in English and German
topics. However, in Japanese topics, the use of word association models (cooc) improved
performance in both scoring functions. For an explanation of this reversal effect, we can consider the
quality of translation. In the diag model, when English and Japanese topics were compared, the
retrieval performance simply degraded as the translation quality degraded. In contrast, the word
association models might provide some improvement. It may be considered as recovering from
the translation errors that caused mismatches in the retrieval process. However, the relationship
between translation quality and the positive effects of word association models was not simple
because it was not monotonic. When comparing diag and cooc in English and German topics,
even though German topics contained some translation errors, the degradation of performance by
using cooc was severer in German than in English. This problem may be better understood by
considering additional languages with varying translation difficulties.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>In our runs, we observed that the use of word association models might help recover query
translation errors given by MT systems. However, because our system performed quite poorly in terms
of MAP scores, it is difficult to generalize this finding. We need to improve the baseline models
to some reasonable level. In the pre-processing and the retrieval models we employed, we did not
consider the following three factors that are important to IR performance: 1) idf, 2) stop words,
and 3) document length. Incorporation of these factors into the modelling may be the first step
towards obtaining a reasonable performance.</p>
      <p>If the use of word association models in cross-language retrieval is beneficial for mitigating
the effect of translation errors, a similar effect should be observed in other types of expansion
techniques. Although we do not know the details of expansion techniques used by other
participants, it seems that the use of ‘Expansion/Feedback’ techniques improved performance in most
languages. We would like to see if these expansion techniques at query time serve as a more
powerful component of retrieval when translation is erroneous than when translation is error free.</p>
      <p>Another direction of interest lies in the design of MT systems. In our runs, we used an MT
system with a single output. If we had used an MT system with multiple candidate outputs with
their confidence scores, the system would have performed the soft expansion by itself. It is not
clear whether using such MT systems with our models will improve or degrade the retrieval results.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>Text-based image retrieval that relies on short descriptions such as titles is considered to be less
robust to translation errors. In the experiments on the ad hoc task in ImageCLEF2005, word
association models helped with the retrieval of Japanese topics when translation into English
using the machine translation system was quite erroneous. We hypothesize that this could be
explained by the recovery effect given by word expansion. The above argument might be verified
by comparing various languages with different degree of difficulties in English translation.</p>
      <p>Two important extensions we could not investigate were the utilization of visual information
and the exploitation of training data sets. We are particularly interested in how the use of these
will help retrieval for difficult topics in which visual or contextual information plays a vital role.
A.1</p>
    </sec>
    <sec id="sec-7">
      <title>Results of query translation</title>
      <p>English (Original) Queries</p>
      <p>German to English Translations
1 Terrestrial airplane
2 The people who meet in the field music hall
3 The dog which sits down
4 The steam ship which is docked to the pier
5 Image of animal
6 Small-sized sailing ship
7 Fishermen on boat
8 The building which the snow accumulated
9 The horse which pulls the load carriage and the carriage
10 Photograph of sun Scotland
11 The Swiss mountain scenery
12 The illustrated postcards of Scotland and island
13 The elevated bridge of the stonework which is plural arch
14 People of market
15 The golfer who does the pad with the green
16 The wave which washes in the beach
17 The man or the woman who reads
18 Woman of white dress
19 Illustrated postcards of the synthesis of province
20 The Scottish visit of king family other than fife
21 Poet Robert Burns’s monument
22 Flag building
23 Grave inside church and large saintly hall
24 Close-up photograph of bird
25 Gate of arch type
26 Portrait photograph of man and woman mixed group
27 The woman or the girl who has the basket
28 Color picture of forest scenery of every place</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mueller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          .
          <article-title>The CLEF cross language image retrieval track (ImageCLEF) 2004</article-title>
          . In ImageCLEF2004 Working Note,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Clough</surname>
          </string-name>
          .
          <article-title>Caption vs. query translation for cross-language image retrieval</article-title>
          .
          <source>In ImageCLEF2004 Working Note</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Masashi</given-names>
            <surname>Inoue</surname>
          </string-name>
          and
          <string-name>
            <given-names>Naonori</given-names>
            <surname>Ueda</surname>
          </string-name>
          .
          <article-title>Retrieving lightly annotated images using image similarities</article-title>
          .
          <source>In SAC '05: Proceedings of the 2005 ACM symposium on Applied computing</source>
          , pages
          <fpage>1031</fpage>
          -
          <lpage>1037</lpage>
          , NY, USA, March
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Wessel</given-names>
            <surname>Kraaij</surname>
          </string-name>
          and Franciska de Jong.
          <article-title>Transitive CLIR models</article-title>
          .
          <source>In RIAO</source>
          , pages
          <fpage>69</fpage>
          -
          <lpage>81</lpage>
          , Vaucluse, France, April
          <volume>26</volume>
          -28
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>