<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Applying LDA in contextual image retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hatem Awadi</string-name>
          <email>awadi.hatem@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mouna Torjmen Khemakhem</string-name>
          <email>torjmen.mouna@redcad.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maher Ben Jemaa</string-name>
          <email>maher.benjemaa@enis.rnu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research unit on Development and Control of Distributed Applications (ReDCAD), Department of Computer Science and Applied Mathematics, National School of Engineers of Sfax, University of Sfax tn}</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes our participation in photo Flickr retrieval task at the ImageCLEF 2012 Campaign. Our aim is to evaluate the performance of topic models, such as Latent Dirichlet Allocation (LDA), in image retrieval based on the textual information surrounding the images. To do this, we propose to extract topics from Flickr user tags1 using the LDA topic model. Then, we use the Jensen-Shannon Divergence measure to compute the similarity between queries and user tags representing images.</p>
      </abstract>
      <kwd-group>
        <kwd>text-based image retrieval</kwd>
        <kwd>Latent Dirichlet Allocation</kwd>
        <kwd>JensenShannon Divergence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Many works in the image retrieval literature have shown that, in Web case,
textual retrieval is more efficient than contenet retrieval [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The common method of searching images by context is to use directly the
text surrounding the images by applying the well known tf-idf scheme [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which
evaluates how important is a word in a document. While this approach reduce
the document into a set of words that are discriminative for documents in the
collection, it provides a relatively small amount of reduction in description length
and do not capture inter- or intradocument statistical structure [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>To resolve those problems , latent dimension can be used to reduce the
termdocument matrix to a much lower dimension subspace that captures most of
the variance in the corpus. The main idea of this technique consists in modeling
documents as a distribution of topics where each topic is a distribution of words.</p>
      <p>
        In this paper, we choose to use Latent Dirichlet Allocation (LDA) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to
model image topics. The first step is to extract topics from user tags
representing images in the given Flickr collection, and then estimate topic distribution of
1 User tags are a kind of metadata describing the images and allowing them to be
found by searching or browsing
the query by inferring the query in the existing topics distributions. Finally, the
Jensen-Shannon Divergence measure [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is used to compute similarity between
queries and user tags representing images.
      </p>
      <p>Topic models are widely used in textual information processing and have
shown their interest in many tasks. Recently, this technique was used in image
representation and processing.</p>
      <p>
        In the image retrieval domain, LDA is mainly used in visual level. Hoster
et. al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] represented an image as a bag of visual words and then applied LDA
to extract visual topics. Many similarity measures are tested where the
JensenShannon Divergence measure [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] performs the best. Greif et. al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have also
used a Correlated Topic Model (CTM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). However, this model did not perform
over previous approaches.
      </p>
      <p>
        Elango et. al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] used LDA topic model for image clustering. Another
application of LDA is in automatic image annotation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>In this work, we are interested in applying LDA to the textual information
related to the images. Resulted topics are then used to find images similar to a
textual user query.</p>
      <p>Our paper is organized as follows. Section 2 presents a review of LDA topic
model and similarity measure that we use in image retrieval. In section 3, we
present experimental results on photo Flickr retrieval task and conclude the
paper in section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>LDA for image retrieval</title>
      <p>The main idea behind the use of topic model in our work is that the image is
probably an illustration of the overall subject (topic) in the document. User tags
are likely to be motivating feature to represent the image since they normally
describe the image content. For this, we use user tags to extract textual topics
of the images. Figure 1 shows an example of a set of topics extracted from Flickr
user tags.</p>
      <p>ocean
beach
sea
coast
pacific
water
waves
rocks
surf
sand
space
stars
chandra
galaxy
smithsonian
institution
telescope</p>
      <p>star
universe
ray</p>
      <p>art
painting
museum
gallery
artist
modern
contemporary
media
mixed
collage
phone
mobile</p>
      <p>cell
camera
sony
cellphone
ericsson
nokia
telephone
blackberry
sunset
evening
sky
dusk
clouds
silhouette
landscape</p>
      <p>sun
twilight
sundown
Fig. 1. Top 10 words of 5 topics extracted from the Flickr user tags</p>
      <sec id="sec-2-1">
        <title>Latent Dirichlet Allocation</title>
        <p>In a large collection, the main problem is that many documents are about the
same idea. Topic models are used to connect documents that share similar
patterns (meaning) by discovering patterns of words.</p>
        <p>The idea behind LDA is to model documents as a distribution of topics
where each topic defines a distribution over words. Specifically, we assume that
K topics are associated with a collection, and that each document defines a
distribution over (hidden) topics. The posterior probability of these latent variables
determines a hidden decomposition of the collection into topics.</p>
        <p>We have D documents using a vocabulary of V word types. Each
documents contains M word tokens. We assume K topics. Each document has a
K-dimensional multinomial θ over topics with a common Dirichlet prior Dir(α).
Each topic has a V-dimensional multinomial ϕ over words with a common
symmetric Dirichlet prior Dir(β).</p>
        <p>Figure 2 shows the various components of this model.</p>
        <p>α
θ
z
w</p>
        <p>M</p>
        <p>D
φ</p>
        <p>K
β</p>
        <p>The generative process of LDA is described as follow:
(1) For each topic,</p>
        <p>(a) Draw a distribution over words ϕ s Dir(β)
(2) For each document,
(a) Determine topic distribution θd s Dir(α)
(b) For each word,
(i) Generate topic z s M ult(θ)
(ii) Generate word w s M ult(ϕ).
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Similarity measure</title>
        <p>
          After running LDA on a corpus, it is possible to use its output to compare
documents to each other. In our case, each tag has a distribution over topics.
Many works use the KL-divergence [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to measure the distance between topics
and therefore distance between documents as follows:
        </p>
        <p>DKL(P ||Q) =</p>
        <p>i
∑ P (i)ln P (i)</p>
        <p>Q(i)
.</p>
        <p>(1)
where P and Q are two probability distributions over topics of two documents
p and q.</p>
        <p>
          But the problem is that the KL-divergence is not symmetric i.e. DKL(P ||Q) ̸=
DKL(Q||P ). An example of symmetric divergence measure named Jensen-Shannon
divergence [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] derived from KL divergence is widely used. To compare two
distributions P and Q using Jensen-Shannon divergence, equation 2 is applied.
For the photo Flickr retrieval task, we use a subset of the MIRFLICKR2
collection composed of 200 000 images. There are a number of 42 textual queries that
are used to perform LDA-based image retrieval.
Concerning the number of topics K, it can not be perfectly fixed because it
depends on many factors, essentially the collection size (the number of documents
). More the size of the collection increases, more the number of topics. So a large
dataset needs a large K. In our experiment we fixed this number to 1000 since we
have a large collection. We conserve the standard setting of the other parameters
: α = 50/K, β = 0.01.
In this section, we present the results of our single official run of the LDA model.
Table 1 shows obtained result [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>According to the obtained results, our method does not perform very well
compared to the best run in ImageCLEF2012 competition. A possible
explication of this result is that the user query is generally composed of a few words.
Consequently, we do not knew about its topic.
2 http://mirflickr.liacs.nl
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This work studies the impact of textual topics in image retrieval. We have applied
the LDA topic model to the user tags representing images. Results show that
this approach does not perform very well. In future works, we plan to improve
results by using the query expansion technique to well know about the possible
topic of the query.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          ,
          <article-title>A correlated topic model of science</article-title>
          .
          <source>The Annals of Applied Statistics</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>17</fpage>
          -
          <lpage>35</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Latent dirichlet allocation</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Elango</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jayaraman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <article-title>Clustering Images Using the Latent Dirichlet Allocation Model (</article-title>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Horster</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lienhart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slaney</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Image retrieval on large-scale image databases</article-title>
          .
          <source>In CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Greif</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horster</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lienhart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Correlated Topic Models for Image Retrieval</article-title>
          .
          <source>Technical Report TR2008-09</source>
          , Institut fur Informatik, Universitat Augsberg,
          <string-name>
            <surname>July</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kullback</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leibler</surname>
            ,
            <given-names>R.A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>On</surname>
            <given-names>Information</given-names>
          </string-name>
          <source>and Sufficiency. Annals of Mathematical Statistics</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          (
          <year>1951</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Divergence measures based on the shannon entropy</article-title>
          .
          <source>IEEE Trans. Infor. Theory</source>
          ,
          <volume>37</volume>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>151</lpage>
          (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          , Jones,
          <string-name>
            <surname>G. J.F.</surname>
          </string-name>
          ,
          <article-title>Document expansion for text-based image retrieval at WikipediaMM 2010</article-title>
          . In: CLEF labs
          <year>2010</year>
          ,
          <article-title>Cross Language Image Retrieval (ImageCLEF</article-title>
          ), pp.
          <fpage>22</fpage>
          -
          <issue>23</issue>
          <year>September 2010</year>
          , Padua, Italy (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Paredes</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <article-title>Large-Scale Text to Image Retrieval Using a Bayesian -Neighborhood Model</article-title>
          . SSPR/SPR, pp.
          <fpage>483</fpage>
          -
          <lpage>492</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Putthividhya</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Attias</surname>
            ,
            <given-names>H. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagarajan</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topic</surname>
          </string-name>
          <article-title>Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation</article-title>
          .
          <source>In: CVPR IEEE</source>
          <year>2010</year>
          , pp.
          <fpage>3408</fpage>
          -
          <lpage>3415</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Introduction to Modern Information Retrieval.
          <source>McGrawHill</source>
          (
          <year>1983</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Thomee</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popescu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Overview of the ImageCLEF 2012 Flickr Photo Annotation and Retrieval Task, CLEF 2012 working notes</article-title>
          , Rome, Italy (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mori</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Max-margin Latent Dirichlet Allocation for Image Classification and Annotation</article-title>
          .
          <source>British Machine Vision Conference (BMVC)</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Yiming</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivor</surname>
            <given-names>W. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jiebo</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Textual Query of Personal Photos Facilitated by Large-Scale Web Data</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          ,
          <volume>33</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>1022</fpage>
          -
          <lpage>1036</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>