<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Increasing cluster recall of cross-modal image retrieval∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon R</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>cz Attila PereszlØnyi</string-name>
          <email>peresz@ilab.sztaki.hu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lint Darczy M</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>s Brendel</string-name>
          <email>mbrendel@ilab.sztaki.hu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute of the Hungarian Academy of Sciences</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe our approach to the ImageCLEF Photo and WikiMediaMM 2008 tasks. The novelty of our method consists of combining image segment based image retrieval with our text based approach. We rank text hits by our own Okapi BM25 based information retrieval system and image similarities by using a feature vector describing the visual content of image segments. Images were segmented by a home developed segmenter. We use automatic query expansion by adding new terms from the top ranked documents. Queries were generated automatically from the title and the downweighted description words.</p>
      </abstract>
      <kwd-group>
        <kwd>Cross-modal retrieval</kwd>
        <kwd>image annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>DÆvid Siklsi</p>
      <p>
        AndrÆs Benczoer
In this paper we describe our approach to the ImageCLEF Photo and WikiMediaMM 2008
evaluation campaigns [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. ImageCLEF Photo is over the IAPR TC-12 benchmark of 20,000 tourist
photos [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and WikiMediaMM is over the INEX MM image database which contains approximately
150,000 images that cover diverse topics of interest. These images are associated with
unstructured and noisy textual annotations in English. Both campaigns are ad-hoc image retrieval tasks:
nd as many relevant images as possible from the image collections.
      </p>
      <p>
        The key feature of our solution in both cases is to combine text based and content based image
retrieval. Our method is similar to the method we applied last year for ImageCLEF Photo [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Our CBIR method is based on segmentation of the image and on the comparison of features of
the segments. We use the Hungarian Academy of Sciences search engine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as our information
retrieval system that is based on Okapi BM25 and automatic query expansion.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The base text search engine</title>
      <p>
        We use the Hungarian Academy of Sciences search engine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as our information retrieval system.
Its ranking algorithm was developed originally for HTML pages. It uses the Okapi BM25 ranking
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with the proximity of query terms taken into account[
        <xref ref-type="bibr" rid="ref3 ref7">7, 3</xref>
        ]. We deploy stopword removal and
stemming by the Porter stemmer. We extended of stop word list with terms such as photo or
image that are frequently used in annotations but does not have a distinctive meaning in this
task.
      </p>
      <p>We considered the text annotation of images including the title, the description and (in case
of ImageCLEF Photo) the location separately. The ranking algorithm uses dierent weights
depending on which part of the document contains the query term. Since many topics have location
reference, we get the best results if the weight of hits inside the location is much higher than the
weights of title and the description.</p>
      <p>
        For queries we use the title and description of the topics with dierent weight. In addition to
stop words we also removed sentences containing the phrase not relevant. We apply automatic
query expansion based on the method described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For a given query Q, we ranked every w
word in the top 10 documents according to the following formula.
      </p>
      <p>Score(Q, w) =</p>
      <p>X (idfi log (δ + log(af (w, ti))idfw/ log(n)))
(1)
ti∈Q
1. af (w, ti) = P10</p>
      <p>j=1 f tij f wj
2. idfi = max{1, log ((N − Ni + 0.5)/(Ni + 0.5))}
3. idfw = max{1, log ((N − Nw + 0.5)/(Nw + 0.5))}
Here f tij is the number of occurrences of ti in the jth document and similarly f wj is the number
of occurrences of w in the jth document. At the 2th and 3th lines N is the total number of
documents in the collection, Ni is the number of documents containing ti and Nw is the number
of documents containing w. We used a correcting term δ to avoid minus innity scores.</p>
      <p>After these computations we expanded our query with those new w words whose score was
above zero and we gave them the weight (Score(w) + 10)/500 as query term weight. We also
ensured that at most the rst 15 words with highest rank got attached to the query.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The content based IR system</title>
      <p>
        The task of the CBIR part was to help annotation based retrieval with a content based method.
Our method for this year is similar to that of 2007 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The basis of our method is to nd
segments on the images of the collection similar to the segments in the sample images. We used
the Felzenszwalb and Huttenlocher [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] graph based segmentation algorithm with dierent number
of segments for the two tasks that turned out most eective by manual investigation. The number
of segments for dierent runs vary with an extreme case of even a single segment per image
corresponding to global similarity measurement. Images were resized to the same size prior to
segmentation.
      </p>
      <p>We extracted 20 features for mean color, size, shape information, and histogram information.
Our histograms had 5 bins in each channel. In addition we used contrast and 2D Fourier
coecients. Contrast means the maximal and minimal values of the L-channel in HSL color space. The
discrete Fourier transformation was sampled along a zig-zag order, i.e. the low frequency
components were included. The DFT features were weighted 10 time higher compared to the other
features. Similarity is measured in the above feature space as
(2)
(3)
(4)
dist(Si, Sj ) = d(F (Si), F (Sj )) : Si ∈ S(X), Sj ∈ S(I)
where S(X) is the set of segments of an image X of the collection and S(I) is the set of segments
of the sample image I; d is a distance function in the feature space and F assigns features to image
segments.</p>
      <p>Given the distance dist (S, S0) of two segments, the distance of image X to sample image I is
computed from pairwise distances between pairs of segments S(X) and S(I) of images X and I,
respectively. In the base method we averaged over Si ∈ S(I) such that for each Si we took the
closest segment from S(X) as
dist(X, I) =
1</p>
      <p>X miin{dist(Si, Sj ) : Si ∈ S(X), Sj ∈ S(I)}.</p>
      <p>|S(I)| j</p>
      <p>We introduce another method for computing image distances from segment distances that also
takes the relative position of the segments into account. For a pair of images I and X rst for all
segments S in I the most similar segment R(S) is searched in X. Then we look for the N closest
neighbors S1, . . . , SN that have common border with S. We dene R(Si) as the segment of X
closest in relative spatial position to R(S). The distance is computed as
where dist∗ is spatial distance within the image. The weights used are a1=1.0, a2=0.8 and a3=0.2.</p>
      <p>When combining TBIR and CBIR we considered TBIR much more reliable than CBIR. Since
image distance decreases with relevance, we used CBIR scores by subtracting them from a
suciently high constant that leaves the rank always positive.
4</p>
    </sec>
    <sec id="sec-4">
      <title>The WikiMedia Task</title>
      <p>In the WikiMedia Task part of the topics included a sample image. For these topics we combined
the text score with a visual score described next. First we resized images to a maximum size of
800x800 by keeping the aspect ratio. We used a global one segment per image method in addition
to a medium granularity segmentation with a minimum segment size of 1500 pixels. This latter
method resulted in less than 100 segments per image.</p>
      <p>The WikiMedia task, with over 100,000 images, already raises scalability issues for a CBIR. The
total size of the feature vectors reached 10 Gigabytes, hence pairwise segment similarity scores
were computed by a parallel matrix multiplication algorithm for which we utilized a computer
cluster. For a single sample image with less than 100 segments, similarity computation with all
100,000 images took over 5 CPU hours by this method. This implies that for a larger collection
similarity search data structures will have to be used.</p>
      <p>There were several variants of our CBIR method for this task. Due to computational limitations
we only used the more complex distance function based on the relative position of the segments.
The comparison with the simple averaging method will be performed in the near future.</p>
      <p>Variants submitted were named as follows. Since the distance of two images as dened in the
previous section is asymmetric, we computed both dist (I, X) and dist(X, I). Label avg stands for
the average of the two while in avgw we give 70% weight for the distance of the target image to
the sample image dist(I, X).</p>
      <p>We also used global, single segment per image runs labeled glob10. Here the Fourier coecients
had weight 10.0 and the other features weight 1.0. The variant avgw_glob10 combined avgw with
glob10 where the later had half the weight of the former.
In the Photo Retrieval Task we used medium sized segmentation as in the WikiMedia task. We
only used plain segment distances with no relative position taken into account; a comparison will
be performed in the near future. In distance computation we weighted the features as follows. For
variant w5 we had size 1; ratio 3; average color 1; shape 1; contrast 1. And for w10 size 1; ratio
3; average color 20; histogram 50; shape 50 and contrast 10.</p>
      <p>In the Photo Task 3 sample images were given for each topic. Consequently, for each image
there are 3 distances. The 3 distances can be combined as average, minimum or maximum. We
tried all of them, which is indicated in the name of the variant as min, max or avg.</p>
      <p>Best performing pure CBIR MAP score 0.0243 was obtained by w5_min with a runner up
0.0212 of w10_min. Ocial runs submitted were unfortunately combined with the avg variants
that performed below a MAP of 0.003 only. In combination with the TBIR score we were able to
improve the MAP from 0.2978 to 0.3014 by w10_min. Ocial runs performed slightly below this
value.
5.1</p>
      <p>Increasing cluster recall
We modied our method to achieve greater diversity within the top 20. For each topic in the
ImageCLEF Photo set, relevant images were manually clustered into sub-topics. Evaluation was
be based on two measures: precision at 20 and instance recall at rank 20 (also called S-recall)
which calculates the percentage of dierent clusters represented in the top 20.</p>
      <p>Our method works as follows. Let Orig (i) be the ith document (0 ≤ i &lt; 999) and OrigSc(i)
be the score of this element on the original list for a given query Qj . We modied these scores
by giving penalties to the scores of the documents based on their Kullback-Leibler distance. We
used the following algorithm.</p>
      <p>1. New (0) = Orig(0) and NewSc(0) = OrigSc(0)
2. For i = 1 to 20
(a) New(i) = argmaxk{CLi(k) |i &lt;= k &lt; 999}
(b) NewSc(i) = max{CLi(k) |i &lt;= k &lt; 999}
(c) For ` = 0 to (i − 1)</p>
      <p>NewSc(`) = NewSc(`) + c(i)
Here CLi(k) = OrigSc(k) + α Pli=−01 KL(i, k). where α is a tunable parameter and KL (i, k) is the
Kullback-Leibler distance of the ith and kth documents. We used a correction term c(i) at Step
2c to ensure that the new scores will be also in descending order.
5.2</p>
      <p>results</p>
      <p>The number in the names indicate the value of α from the previous section. As it can be seen
the algorithm used to increase CR20 was successful with α = 0.5. The cost of increasing CR20
was worse MAP and P20. The value α = 5.0 turned out to be too much. Using CBIR and query
expansion improved the algorithm a little, and the same trend is true for α.</p>
      <p>Unfortunately at the moment we do not have evaluations for text+qe and text+image without
qe to check which part is responsible for the improvement.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>We have demonstrated that image based retrieval based on segmentation can improve the
performance of an IR system. In future work we will conduct a more thorough comparison of the
dierent CBIR techniques introduced. We also plan to improve on the performance of query
expansion.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>AndrÆs</given-names>
            <surname>Benczoer</surname>
          </string-name>
          , IstvÆn Br, MÆtyÆs Brendel, CsalogÆny KÆroly, BÆlint Darczy, and DÆvid Siklsi.
          <article-title>Cross-modal retrieval by text and image feature biclustering</article-title>
          .
          <source>In Working Notes for the CLEF 2007 Workshop</source>
          , Budapest, Hungary,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>AndrÆs</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Benczoer</surname>
            , KÆroly CsalogÆny,
            <given-names>Eszter</given-names>
            Friedman, DÆniel Fogaras, TamÆs Sarls, MÆtØ Uher, and Eszter
          </string-name>
          <string-name>
            <surname>Windhager</surname>
          </string-name>
          .
          <article-title>Searching a small national domainpreliminary report</article-title>
          .
          <source>In Proceedings of the 12th World Wide Web Conference (WWW)</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Bttcher</surname>
          </string-name>
          ,
          <string-name>
            <surname>Charles L. A. Clarke</surname>
            , and
            <given-names>Brad</given-names>
          </string-name>
          <string-name>
            <surname>Lushman</surname>
          </string-name>
          .
          <article-title>Term proximity scoring for ad-hoc retrieval on very large text collections</article-title>
          .
          <source>In SIGIR '06</source>
          , pages
          <fpage>621622</fpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Pedro</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Felzenszwalb</surname>
          </string-name>
          and Daniel P. Huttenlocher.
          <article-title>Ecient graph-based image segmentation</article-title>
          .
          <source>International Journal of Computer Vision</source>
          ,
          <volume>59</volume>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          , Paul Clough, Allan Hanbury, and
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mller</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEFphoto 2007 photographic retrieval task</article-title>
          .
          <source>In Working Notes of the 2007 CLEF Workshop</source>
          , Budapest, Hungary,
          <year>September 2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Grubinger</surname>
          </string-name>
          , Paul Clough, Henning Mller, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Deselears</surname>
          </string-name>
          .
          <source>The IAPR TC-12</source>
          benchmark
          <article-title>- a new evaluation resource for visual information systems</article-title>
          . In OntoImage, pages
          <fpage>1323</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yves</given-names>
            <surname>Rasolofo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jacques</given-names>
            <surname>Savoy</surname>
          </string-name>
          .
          <article-title>Term proximity scoring for keyword-based retrieval systems</article-title>
          .
          <source>In ECIR</source>
          , pages
          <fpage>207218</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Stephen</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robertson</surname>
          </string-name>
          and Karen Sparck Jones.
          <article-title>Relevance weighting of search terms</article-title>
          .
          <source>In Document retrieval systems</source>
          , pages
          <fpage>143160</fpage>
          . Taylor Graham Publishing, London, UK, UK,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.B.</given-names>
            <surname>Croft</surname>
          </string-name>
          .
          <article-title>Query expansion using local and global document analysis</article-title>
          .
          <source>Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>411</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>