<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using pseudo-relevance feedback to improve image retrieval results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mouna Torjmen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karen Pinel-Sauvagnat</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohand Boughanem</string-name>
          <email>bougha@irit.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRIT</institution>
          ,
          <addr-line>118 Route Narbonne-31062 Toulouse Cedex 4 -</addr-line>
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <abstract>
        <p>In this paper, we propose a pseudo-relevance feedback method to deal with the photographic retrieval and medical retrieval tasks of ImageCLEF 2007. The aim of our participation to ImageCLEF is to evaluate a combination method using both english textual queries and image queries to answer to topics. The approach processes image queries and merges them with textual queries in order to improve results. We do not obtain good results using only textual information and queries. To process image queries, we used the Fire system to sort similar images using low level features, and we then used associated textual information of the top images to construct a new textual query. Results showed the interest of low level features to process image queries, as performance increased compared to textual queries processing. Finally, best results were obtained combining the results lists of textual queries processing and image queries processing with a linear function .</p>
      </abstract>
      <kwd-group>
        <kwd>Image retrieval</kwd>
        <kwd>pseudo-relevance feedback</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1. The context of an image is all information about the image coming from others sources than
the image itself. For the time being, only textual information is used as context. The main
problem of this approach is that documents can use di erent words to describe the same
image or can use the same words to describe di erent concepts. Moreover image queries
can't be processed.
2. Content Based Image Retrieval (CBIR) systems use low-level image features to return images
similar to an example image. The main problem of this approach is that visual similarity
does not always correspond to semantic similarity (for example a CBIR system can return
a picture of blue sky when the example image is a blue car).</p>
      <p>
        Most of the image retrieval systems combine nowadays content and context retrieval, in order
to take advantages of both methods. Indeed, it has been proved that combining text- and
contentbased methods for images retrieval always improves performance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Images and textual information can be considered as independent and content and contextual
information of queries can be combined in di erent ways:</p>
      <p>
        Image queries and textual queries can be processed separately and the two results lists are
then merged using a linear function [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        One can also use a pipeline approach: a rst search is done using textual information or
content information, and a ltering step is then processed using the other information type
to exclude non-relevant images [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Other methods use Latent Semantic Analysis (LSA) techniques to combine visual and textual
information, but are not e cient [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Some other works propose translation-based methods, in which content and context information
are complementary. The main idea is to extract relations between images and text, and to use
them to translate textual information to visual one and vice versa [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]:
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], authors translate textual queries to visual ones.
authors of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose to translate image queries to textual ones, and to process them using
textual methods. Results are then merged with those obtained with textual queries. Authors
in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] also propose to expand the initial textual query by terms extracted thanks to an image
query.
      </p>
      <p>For the latter methods, the main problem to construct a new textual query or expand an initial
textual query is term extraction. To do this, the main solution is pseudo-relevance feedback. Using
pseudo-relevance feedback in context based image retrieval to process image queries is slightly
di erent from classic pseudo-relevance feedback. The rst step is to use a visual system to process
image queries. Images obtained as results are considered as relevant and the associated textual
information is then used to select terms in order to express a new textual query.</p>
      <p>
        The work presented in this paper also propose to combine context and content information
to answer to the photographic retrieval and medical retrieval tasks. More precisely, we present a
method to transform image queries to textual ones. We use XFIRM [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], a structured information
retrieval system to process english textual queries, and the Fire system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to process image queries.
Documents corresponding to the images returned by Fire are used to extract terms that will form
a new textual query.
      </p>
      <p>
        The paper is organized as follows. In Section 2, we describe textual queries processing using
the XFIRM system. In Section 3, we describe the image queries processing using in a rst step, the
Fire system, and in a second step a pseudo-relevance feedback method. In Section 4, we present
our combination method, which uses both results of the XFIRM and FIRE systems. Experiments
and results for the two tasks (medical retrieval and photographic retrieval [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) are exposed in
section 5. Finally we conclude in Section 6 .
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Textual queries processing</title>
      <p>
        Textual information of collections used for the photographic and medical retrieval tasks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is
organised using the XML language. In the indexing phase, we decided to only use documents
elements containing positive information: description , title , notes and location .
We then used the XFIRM system [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to process queries. XFIRM (XML Flexible Information
Retrieval Model ) uses a relevance propagation method to process textual queries in XML documents.
Relevance values are rst computed on leaf nodes (which contain textual information) and scores
are then propagated along the document tree to evaluate inner nodes relevance values.
      </p>
      <p>Let q = t1; : : : ; tn be a textual query composed of n terms. Relevance values of leaf nodes ln
are computed thanks to a similarity function RSV (q; ln).</p>
      <p>n
RSV (q; ln) = X wiq
i=1
wiln;
where
wiq = tfiq
and
wiln = tfiln
idfi iefi
(1)
wiq and wiln are the weights of term i in query q and leaf node ln respectively. tfiq and tfiln are the
frequency of i in q and ln, idfi = log(jDj=(jdij + 1)) + 1, with jDj the total number of documents
in the collection, and jdij the number of documents containing i, and iefi is the inverse element
frequency of term i, i.e. log(jN j=jnfij + 1) + 1, where jnfij is the number of leaf nodes containing
i and jN j is the total number of leaf nodes in the collection.
idfi allows to model the importance of term i in the collection of documents, while iefi allows to
model it in the collection of elements.</p>
      <p>Each node n in the document tree is then assigned a relevance score rn which is function of the
relevance scores of the leaf nodes it contains and of the relevance value of the whole document.
rn =
jLrnj:</p>
      <p>X
lnk2Ln
dist(n;lnk) 1</p>
      <p>RSV (q; lnk) + (1
) rroot
dist(n; lnk) is the distance between node n and leaf node lnk in the document tree, i.e. the number
of arcs that are necessary to join n and lnk, and 2]0::1] allows to adapt the importance of the
dist parameter. In all the experiments presented in the paper, is set to 0.6.</p>
      <p>
        Ln is the set of leaf nodes being descendant of n, and jLrnj is the number of leaf nodes in Ln having
a non-zero relevance value (according to equation 1). 2]0::1], inspired from work presented in
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], allows the introduction of document relevance in inner nodes relevance evaluation, and rroot
is the relevance score of the root element, i.e. the relevance score of the whole document, evaluated
with equation 2 with = 1.
      </p>
      <p>Finally, the documents dj containing relevant nodes are retrieved with the following relevance
score:</p>
      <p>rxfirm(dj ) = maxn2dj rn</p>
      <p>Images associated to the documents are lastly returned by the system to answer to the retrieval
tasks.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Image queries processing</title>
      <p>
        To process image queries, we used a third-steps method: (1) a rst step processes images using
the Fire System [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], (2) we then use pseudo-relevance feedback to construct new textual queries ,
(3) the new textual queries are processed with the XFIRM system.
      </p>
      <p>We rst used the Fire system to get the top K similar images to the image query. We then get
the N associated textual documents (with N K, because some images do not have associated
textual information) and extracted the top L terms from them. To select the top L terms, we
evaluated two formula to express the weight wi of term ti.</p>
      <p>The rst formula uses the frequency of term ti in the N documents.</p>
      <p>N
wi = X tfij
j=1
(2)
(3)
(4)
where tfij is the frequency of term ti in document dj.</p>
      <p>The second formula uses terms frequency in the N selected documents, the number of
documents in the N selected containing the term, and a normalized idf of the term in the whole
collection.</p>
      <p>N
wi = [1 + log(X tfij)]
j=1
ni
N
log( dDi )
log(D)
(5)
where ni is the number of documents in the N associated documents containing the term ti, D is
the number of all documents in the collection and di is the number of documents in the collection
containing ti.</p>
      <p>The use of the nNi parameter is based on the following idea: a term occuring one time in n
documents is more important and must be more relevant than a term occuring n times in one
document. The log function is used on PjN=1 tfij because without it results with or without the
nNi parameter were almost the same.</p>
      <p>We then construct a new textual query with the top L terms selected according to formula 4
or 5 and we process it using the XFIRM system (as explained in section 2).</p>
      <p>In the photographic retrieval task, we obtained the following queries for topic Q48, with K = 5
and L &lt;= 5:
Textual query using equation 4: "south korea river"
Textual query using equation 5: "south korea night forklift australia"</p>
      <p>The original textual query in english was: "vehicle in South Korea". As we can see, the query
using equation 5 is more similar to the original query than the one using equation 4.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Combination function</title>
      <p>To evaluate the interest of using both content and context information, we combined results of
image queries and textual queries processing and we evaluated new relevance scores r(dj) for
documents dj:
r(dj) =
(rxfirm(dj)) + (1
) (rP RF (dj))
(6)
where rxfirm(dj) is the relevance score of document dj according to the XFIRM system (equation
3) and rP RF (dj) is the relevance score of dj according to the XFIRM system after image queries
processing (see section 3).</p>
      <p>In order to answer to both retrieval tasks, we then return all images associated to the top
ranked documents.</p>
      <p>Figure 1 illustrates our approach.</p>
      <p>Top K
images
s
e
g
a
m
i
t
x
e
t
L
M</p>
      <p>X</p>
      <sec id="sec-4-1">
        <title>Whole collection</title>
      </sec>
      <sec id="sec-4-2">
        <title>Images results</title>
      </sec>
      <sec id="sec-4-3">
        <title>XFIRM System</title>
      </sec>
      <sec id="sec-4-4">
        <title>Documents and their associated relevance scores</title>
      </sec>
      <sec id="sec-4-5">
        <title>XML associated text</title>
        <p>New textual
query (L terms)</p>
      </sec>
      <sec id="sec-4-6">
        <title>XFIRM System</title>
      </sec>
      <sec id="sec-4-7">
        <title>Documents and their associated relevance scores</title>
        <p>Linear cfoumncbtiinoan
tion</p>
        <sec id="sec-4-7-1">
          <title>Final documents results</title>
          <p>Images
associated
to documents</p>
        </sec>
        <sec id="sec-4-7-2">
          <title>Final images results</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation and results</title>
      <p>Photographic Retrieval Task</p>
      <p>Evaluation of textual queries
We evaluated english textual queries using the XFIRM system with parameters
Results, which are almost the same, are presented in table 1.
= 0:9 and
= 1.</p>
      <p>Run-id
RunText0609
RunText061</p>
      <p>We notice that the use of term frequency in selected documents is not enough, and that the
importance of the term in the collection need to be used in the term weighted function (results
are better with equation 5 than with equation 4).</p>
      <p>If we now compare table 1 and table 2, we see that processing image queries with the Fire
system and our pseudo-relevance feedback system gives better results than using only the XFIRM
system on textual queries. It shows the importance of visual features to retrieve images.
For this task, we only evaluated the combination method described in section 4. RunComb09 uses
equation 5 with = 1, K=15, L=10 and = 0:9.</p>
      <p>RunComb05 uses equation 4 with =1, K=6, L=5 and = 0:5.</p>
      <p>Results are signi cantly better for run RunComb09. However, as many parameters are involved
(K, L, and the equation used to select terms) it is di cult to conclude on which parameters
impact the results. Further experiments are thus needed.</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>Increasing the number of textual information resources to construct new textual queries from
image queries improves results: the K number of selected images from FIRE results has a great
impact on results. Increasing K improves thus results by introducing relevant information.
Another factor of in uence on results is the number of new query terms L. In our experiments,
when K and L increase, the MAP metric also increases.</p>
      <p>Moreover, processing textual queries or images separately does not allow to obtain the best results:
combining the two sources of evidence clearly improves results.</p>
      <p>Finally, we'd like to conclude with the type of textual information used. In the Medical and
Photographic Retrieval Tasks, textual information is encoded using the XML language, and as a
consequence, we decided to use an XML-oriented information retrieval system to process textual
queries (XFIRM). However, elements are not organized in a hierarchic way as in can be the case
in XML documents (no ancestor-descendant relationships between nodes), and the functions used
by the XFIRM system to evaluate nodes relevance may not be appropriate in that case. Other
experiments are consequently needed with a plain-text information retrieval system. Combining
the XFIRM system with the FIRE system may be however interesting with fully encoded-XML
collections.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and future work</title>
      <p>We participated in the Photographic and Medical Retrieval Tasks of ImageCLEF 2007 in order to
evaluate a method using a content- and context-based approach to answer to topics. We proposed a
new pseudo-relevance feedback approach to process image queries and we tested an XML oriented
system to process textual queries. Results showed the interest of combining the two sources of
evidence (content and context) to answer to image retrieval.</p>
      <p>In future work, we plan to:</p>
      <p>
        Add low level features results extracted from FIRE to the combination function in the
Medical Retrieval Task, as visual features are very important in the medical domain.
Sort images using concepts level features [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] instead of low level features to construct new
textual queries in the Photographic Retrieval Task.
      </p>
      <p>Use a speci c domain ontology to expand textual queries (original textual queries and queries
obtained with our pseudo-relevance feedback approach).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Susanne</given-names>
            <surname>Boll</surname>
          </string-name>
          , Wolfgang Klas, and
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Wandel</surname>
          </string-name>
          .
          <article-title>A cross-media adaptation strategy for multimedia presentations</article-title>
          .
          <source>In ACM Multimedia (1)</source>
          , pages
          <fpage>37</fpage>
          {
          <fpage>46</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Yih-Chen</surname>
            <given-names>Chang</given-names>
          </string-name>
          , Wen-Cheng
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hsin-Hsi Chen</surname>
          </string-name>
          .
          <article-title>A corpus-based relevance feedback approach to cross-language image retrieval</article-title>
          .
          <source>In CLEF</source>
          , pages
          <volume>592</volume>
          {
          <fpage>601</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Deselaers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Keysers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>FIRE | exible image retrieval engine: ImageCLEF 2004 evaluation</article-title>
          . In CLEF Workshop (
          <year>2004</year>
          ),
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Deselaers</surname>
          </string-name>
          , Henning Mller, Paul Clogh, Hermann Ney, and
          <string-name>
            <surname>Thomas</surname>
            <given-names>M Lehmann.</given-names>
          </string-name>
          <article-title>The clef 2005 automatic medical image annotation task</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>74</volume>
          (
          <issue>1</issue>
          ):
          <volume>51</volume>
          {
          <fpage>58</fpage>
          ,
          <year>August 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuhr</surname>
          </string-name>
          , Mounia Lalmas,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Kazai</surname>
          </string-name>
          .
          <source>INEX 2005 workshop proceedings</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          , Paul Clough, Allan Hanbury, and
          <article-title>Henning Muller. Overview of the ImageCLEF 2007 photographic retrieval task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Gareth</surname>
            <given-names>J. F.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Burke</surname>
          </string-name>
          , John Judge, Anna Khasin,
          <string-name>
            <surname>Adenike M. Lam-Adesina</surname>
            , and
            <given-names>Joachim</given-names>
          </string-name>
          <string-name>
            <surname>Wagner</surname>
          </string-name>
          . Dublin city university at clef 2004:
          <article-title>Experiments in monolingual, bilingual and multilingual retrieval</article-title>
          .
          <source>In CLEF</source>
          , pages
          <volume>207</volume>
          {
          <fpage>220</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Wen-Cheng</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yih-Chen Chang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hsin-Hsi Chen</surname>
          </string-name>
          .
          <article-title>Integrating textual and visual information for cross-language image retrieval</article-title>
          .
          <source>In Proceedings of the Second Asia Information Retrieval Symposium</source>
          , pages
          <volume>454</volume>
          {
          <fpage>466</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Wen-Cheng</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yih-Chen Chang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Hsin-Hsi Chen</surname>
          </string-name>
          .
          <article-title>Integrating textual and visual information for cross-language image retrieval: A trans-media dictionary approach</article-title>
          . Inf. Process. Manage.,
          <volume>43</volume>
          (
          <issue>2</issue>
          ):
          <volume>488</volume>
          {
          <fpage>502</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Nicolas</surname>
            <given-names>Maillot</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Pierre</surname>
            <given-names>Chevallet</given-names>
          </string-name>
          , Vlad Valea, and Joo Hwee Lim.
          <article-title>Ipal inter-media pseudo-relevance feedback approach to imageclef 2006 photo retrieval</article-title>
          .
          <source>In Working Notes for the CLEF 2006 Workshop</source>
          ,
          <fpage>20</fpage>
          -
          <lpage>22</lpage>
          September , Alicante, Spain,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Yosi</given-names>
            <surname>Mass</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matan</given-names>
            <surname>Mandelbrod</surname>
          </string-name>
          .
          <article-title>Experimenting various user models for XML retrieval</article-title>
          .
          <source>In [5]</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takahashi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Oka</surname>
          </string-name>
          .
          <article-title>Image-to-word transformation based on dividing and vector quantizing images with words</article-title>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13] Henning Muller, Thomas Deselaers, Eugene Kim, Jayashree Kalpathy-Cramer,
          <string-name>
            <given-names>Thomas M.</given-names>
            <surname>Deserno</surname>
          </string-name>
          , Paul Clough, and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEFmed 2007 medical retrieval and annotation tasks</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Sauvagnat</surname>
          </string-name>
          .
          <article-title>Modle exible pour la recherche d'information dans des corpus de documents semi-structurs</article-title>
          .
          <source>PhD thesis</source>
          , Toulouse : Paul Sabatier University,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Cees G. M. Snoek</surname>
          </string-name>
          , Marcel Worring, Jan C. van Gemert,
          <string-name>
            <surname>Jan-Mark Geusebroek</surname>
          </string-name>
          , and
          <string-name>
            <surname>Arnold</surname>
            <given-names>W. M.</given-names>
          </string-name>
          <string-name>
            <surname>Smeulders</surname>
          </string-name>
          .
          <article-title>The challenge problem for automated detection of 101 semantic concepts in multimedia</article-title>
          .
          <source>In MULTIMEDIA '06: Proceedings of the 14th annual ACM international conference on Multimedia</source>
          , pages
          <volume>421</volume>
          {
          <fpage>430</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Thijs</given-names>
            <surname>Westerveld</surname>
          </string-name>
          .
          <article-title>Image retrieval: Content versus context</article-title>
          .
          <source>In Content-Based Multimedia Information Access, RIAO 2000 Conference Proceedings</source>
          , pages
          <volume>276</volume>
          {
          <fpage>284</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Grosky</surname>
          </string-name>
          .
          <article-title>Narrowing the semantic gap - improved text-based web document retrieval using visual features</article-title>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>