<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bioingenium at ImageClefmed 2010: A Latent Semantic Approach</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Colombia, Bioingenium Research Group</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <abstract>
        <p>This paper describes the participation of the Bioingenium Research Group in the ad hoc Medical Image Retrieval task for the 2010 ImageCLEF forum. The work aimed to explore semantic relationships in textual information and transfer into visual information by building a uni ed image searching index. The proposed strategy is based on the use of Non-Negative Matrix Factorization to decompose matrix data and build latent semantic spaces.</p>
      </abstract>
      <kwd-group>
        <kwd>Non-Negative Matrix Factorization</kwd>
        <kwd>Content-Based Information Retrieval</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Medical Image Retrieval task of ImageCLEF aims to evaluate computational
methods to access and retrieve visual contents for medical applications. For the
2010 forum, the challenge consists of proposing computational alternatives for
retrieving images from a database containing 77,000 images extracted from
medical papers and 16 query topics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This paper describes the approach followed
by the Bioingenium Research Group at the National University of Colombia,
which focused on combining visual features extracted from images together with
text captions taken from the manuscript, to build a uni ed index for searching
medical images. Latent Semantic Indexing (LSI) strategies are used at the core
of our approach to model implicit relationships between visual and textual data.
      </p>
      <p>Since our image indexing method integrates visual and textual information,
we are able to map di erent types of queries to the latent semantic space so as
to search for images. Be it a text query, an image example or a mixed query, the
system uses the same index to retrieve relevant images. To achieve this, we use
Non-Negative Matrix Factorization algorithms to build latent semantic spaces
using multimodal information. We paid special attention to the case in which
a visual example is provided as query to search the multimodal index. When
only visual features are used to match contents in the collection, the multimodal
index still has information taken from textual data, which can in uence the
search results.</p>
      <p>These algorithms can be computationally expensive and therefore it might
take too long for them to process large image collections. For this reason, they
had to be suited for running experiments in the ImageCLEFmed 2010 collection
by using a structured initialization strategy based on Singular Value
Decomposition, which allowed the algorithms to converge faster to the desired factorization.
We run experiments that involved di erent con gurations of our model to
evaluate their performance.</p>
      <p>The contents on this paper are organized as follow: Section 2 summarizes the
methods used to process visual and textual data separately. Section 3 presents
the main methods of our retrieval framework. Section 4 presents the
experimental setup, the results and discussions. Finally, the paper ends with concluding
remarks in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Processing</title>
      <p>In our approach, textual data and visual data are rst processed in an
independent manner. The purpose of this preprocessing step is to build two matrices
that represent the content of each modality for all images in the collection by
using a certain set of features.
2.1</p>
      <sec id="sec-2-1">
        <title>Text Processing</title>
        <p>
          We used the paper title and image caption as semantic context for each image.
The Natural Language Toolkit (NLTK) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] was used to build a vector space
representation of textual information. Common text processing techniques, such
as stop words removal and word stemming, were applied to the corpus and
a TF-IDF weighted scheme was used as nal text representation. Some terms
were removed from the vector space to generate a more compact representation
by pruning terms with too low or too high frequency within the corpus.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Visual Processing</title>
        <p>
          We used a Bag of Features strategy in which each image is represented as a
histogram of frequencies of prede ned visual patterns [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. These visual patterns
are organized in a codebook of DCT features [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] that is quantized using the
k-means algorithm. Blocks of 8x8 pixels are rst taken from a regular grid in
each image in the collection and processed using the Discrete Cosine Transform
in the three RGB color channels. The coe cients of these transform are used as
features to construct the codebook and to match visual patterns when building
the histogram representation. The size of the codebook was set to 2,000 and
5,000 features in our experiments.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Retrieval Framework</title>
      <p>
        Latent semantic strategies have been shown to be a powerful set of methods to
nd latent relationships between features in document collections. Using a
termdocument representation, it is possible to nd latent patterns such as highly
correlated terms through matrix decompositions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A Singular Value
Decomposition (SVD) of the term-document matrix is used in information retrieval to
construct latent semantic spaces so that documents that refer to certain
topics can be highly scored even when there is not an explicit occurrence of the
query terms. This is possible thanks to the implicit relationships existing
between terms that are found during the indexing process. This technique usually
shows improved retrieval performance compared to a document retrieval engine
that only uses the term frequencies to score documents.
      </p>
      <p>We use Non-negative Matrix Factorization as a exible algorithm to model
a latent semantic space, which correlates visual features and text terms in a
multimodal collection.
3.1</p>
      <sec id="sec-3-1">
        <title>Input data</title>
        <p>The term-document matrix Xt of size mxn is used to represent text data in
the collection, where m is the number of images and n is the number of terms.
The values in each cell are the frequency of the corresponding term for an image.
Similarly, the feature-document matrix Xv is a matrix of size mxp, with m being
the number of images and p the number of visual features.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Non-negative Matrix Factorization (NMF)</title>
        <p>
          In document clustering as well as in document retrieval and document
classication, an important task is to recognize the semantic relationships existing
between terms in a collection. NMF is a novel technique proposed to address
problems of this nature and has been actively used for the analysis of text
documents [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as well as image collections [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ][
          <xref ref-type="bibr" rid="ref5">5</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This method decomposes a
non-negative matrix into two lower rank non-negative matrices; the rst one
encoding the basis of the latent space and the second one encoding the
coefcients of the document representation in that latent space. NMF is modeled
as an optimization problem with non-negative restrictions on both matrix
factors for which di erent objective functions can be used. We used the Divergence
objective function to nd the factorization [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]:
        </p>
        <p>X Xij log
i;j</p>
        <p>Xij
W Hij</p>
        <p>Xij + W Hij
(1)
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Singular Value Decomposition (SVD) Initialization</title>
        <p>
          NMF is a powerful matrix decomposition strategy but it has an important
problem to consider: its convergence performance is slow. In order to make it usable
for a large-scale image retrieval application, we applied an initialization
algorithm based on SVD operations [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          SVD is a very common matrix decomposition strategy, in which three
matrices are obtained: two orthonormal matrices and a diagonal matrix. The
orthonormal matrices have the particularity of being linearly independent, but the
can still have negative values. Based on an interesting result about the increase
of the rank unit matrix when the negative values are changed to zero, a method
that allows getting a non-negative SVD decomposition is used to build W and
H initial values. This strategy is known as Double Non-Negative Singular Value
Decomposition [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and its use speeds up the indexing process for about 3 times.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Latent Representation</title>
        <p>
          We model the factorization problem using an asymmetric algorithm as it is
described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In this method, the text matrix Xt is rst exploited to build
the basis of the latent semantic space and then the matrix Xv is used to adapt
the visual representation to such latent semantic space. The algorithm follows
two basic steps:
1. Learning the semantic basis: NMF is applied to solve Xt = WtHt, where Xt
is the text matrix.
2. Adapting a visual basis: A modi ed version of NMF is applied to nd Xv =
WvHt in order to learn only a Wv that spans the same latent space obtained
from the text analysis, but using visual features.
        </p>
        <p>Since the basis of the latent space is formed using textual data, it is expected
that this representation helps reduce the semantic gap in the CBIR system.
This approach is meant to use the same semantic representation found in text
decomposition to calculate the corresponding basis that satis es the NMF
decomposition of the visual data. With the new Wv basis, images lacking text can
be mapped into the semantic space in the same way as text documents are, but
notice that only visual features are required.</p>
        <p>Indexing texts or projecting text queries to the semantic space is
straightforward because a Wt basis has been also learned. When queries have both
modalities available, an automatic strategy to combine both is used. The Wt and Wv
basis are simply concatenated to allow a mixed projection of multimodal data.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>We participated with three di erent approaches that are classi ed according to
the information used in the query: only text terms, only visual features and mixed
text-visual information. To tune the model parameters, we used the queries from
the 2009 ImageCLEFmed challenge and tried solving them on the 2010
collection. Our goal was to maximize the Mean Average Precision (MAP) according
to the ground truth for the 2009 queries and then use the same con guration to
solve the 2010 topics.</p>
      <p>The main parameter in our model is the size of the semantic space, which
determines the number of latent concepts. We calculated various semantic spaces
using di erent sizes to evaluate the performance with di erent con gurations.
In order to explore the search space before submitting our runs, we used a
logarithmic scale to set the latent space size parameter.</p>
      <p>Below is a description of our 6 submitted runs:
{ T ext k = 211: only text information was used in the queries as well as
in the collection. In this experiment we used 2,048 as the size of the latent
semantic space.
{ AsymmetricM ixed k = 211: both visual and text information is used to
process the queries. The size of the semantic space was the same as in the
previous experiment.both visual and text information is used to process the
queries. The size of the semantic space was the same as in the previous
experiment.</p>
      <p>The following runs only used visual information to query the system:
{ AsymmetricDCT 2000 k = 25: a codebook of 2,000 visual patterns was
used to represent image features. The size of the latent semantic space was
set to 25 = 32.
{ AsymmetricDCT 5000 k = 25: a codebook with 5,000 elements and 25 = 32
latent semantic dimensions.
{ AsymmetricDCT 5000 k = 27: a codebook with 5,000 elements and 27 =
128 latent semantic dimensions.
{ AsymmetricDCT 5000 k = 27:5: a codebook with 5,000 elements and
27:5 = 181 latent semantic dimensions.</p>
      <p>We provide a comparison of the results obtained with the 2009 and 2010
queries, even though the ground truths were obtained by analyzing di erent
versions of the same collection. In particular, the number of images in the
collection when the ground truth of 2009 was built was about 10% smaller than
the 2010 version. However, this may be considered as a good estimation of the
expected performance.</p>
      <p>
        We observed that when the size of the latent space is increased, a better
performance is obtained using text queries. Table 1 shows the MAP and precision
in 1000 (P1000) obtained using the 2009 and 2010 queries, respectively, with the
corresponding con gurations for each run. Using only visual information on the
2009 queries, the con guration described for AsymDCT 2000 k = 25 (2,000
visual features and 25 latent semantic dimensions) obtained the best performance.
This result is even a very good score compared to the results obtained by
participants in the 2009challenge when using only visual information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However,
in the 2010 challenge the results were not as encouraging.
      </p>
      <p>The performance results shown in Table 1 show consistency between the
results obtained for the 2009 and 2010 queries, both for MAP and Precision
at 1,000 results (P1000). The search strategy based only on text information
outperformed both the visual and mixed strategies. However, it is well known
that only visual information lacks of semantic information to search for images
and this gap still remains an open problem for image retrieval. Our method is
an attempt to bridge the gap by representing visual information together with
text data in order to bring some semantic structure to the representation space.
Our results did not diverge too much from those obtained by other competitors
in the challenge.</p>
      <p>The mixed approach also showed a better performance compare to the sole
use of visual features, but it is still not as good as using only text, even though
more information is provided to the search engine with this approach. These
results suggest that visual features are not contributing to nding more relevant
images, but are instead reducing the number of retrieved images.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Concluding Remarks</title>
      <p>This paper summarizes an exploratory work in semantic representations by
which the Bioingenium Research Group participates in ImageCLEF2010. The
main goal of this strategy was to build a uni ed indexing method to allow
representing simultaneously visual information as well as textual information using
NMF algorithms. The modeling of latent semantic spaces was followed to
approximate hidden relationships between visual features and text terms.</p>
      <p>To make this strategy feasible for real world image collections, we employed a
structured initialization of the NMF algorithm is based on non-negative
decompositions of large matrix data to help speed up the indexing time. This allowed
us to process the ImageCLEF2010 collection of 77,000 images with their
corresponding textual data and conduct experiments using the ground truth of 2009
and submit runs for the 2010 topics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mller</surname>
          </string-name>
          , Jayashree Kalpathy-Cramer, Ivan Eggel, Steven Bedrick,
          <string-name>
            <given-names>Charles E. Kahn</given-names>
            <surname>Jr.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>William</given-names>
            <surname>Hersh</surname>
          </string-name>
          .:
          <article-title>Overview of the CLEF 2010 medical image retrieval track</article-title>
          .
          <source>In the Working Notes of CLEF</source>
          <year>2010</year>
          , Padova, Italy,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gonzalez</surname>
          </string-name>
          , Fabio A. and
          <string-name>
            <surname>Caicedo</surname>
          </string-name>
          ,
          <string-name>
            <surname>Juan</surname>
            <given-names>C.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Nasraoui</surname>
          </string-name>
          , Olfa and
          <string-name>
            <surname>Ben-Abdallah</surname>
          </string-name>
          ,
          <article-title>Jaafar: NMF-based multimodal image indexing for querying by visual example</article-title>
          .
          <source>CIVR '10: Proceedings of the ACM International Conference on Image and Video Retrieval</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Henning</given-names>
            <surname>Mller</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jayashree</surname>
            <given-names>KalpathyCramer</given-names>
          </string-name>
          , Ivan Eggel , Steven Bedrick , Sad Radhouani , Brian Bakke , Charles E. Kahn Jr. , William Hersh:
          <article-title>Overview of the CLEF 2009 medical image retrieval track</article-title>
          .
          <source>In the Working Notes of CLEF</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , Daniel D. and
          <string-name>
            <surname>Seung</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Sebastian: Learning the parts of objects by nonnegative matrix factorization</article-title>
          .
          <source>Nature</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cooper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Foote</surname>
          </string-name>
          , J.:
          <article-title>Summarizing video using non-negative similarity matrix factorization</article-title>
          .
          <source>IEEE Workshop on Multimedia Signal Processing</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J. Z.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Image retrieval: Ideas, in uences, and trends of the new age</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>40</volume>
          ,
          <issue>2</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. W. Liu,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <article-title>Non-negative matrix factorization for visual coding</article-title>
          .
          <source>In IEEE International Conference on Acoustics, Speech and Signal Processing</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>NLTK: the natural language toolkit</article-title>
          .
          <source>In Proceedings of the COLING/ACL on interactive Presentation Sessions (Sydney, Australia, July 17 - 18</source>
          ,
          <year>2006</year>
          ).
          <article-title>Annual Meeting of the ACL</article-title>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Morristown, NJ,
          <fpage>69</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>E.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jurie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Triggs</surname>
          </string-name>
          .
          <article-title>Sampling strategies for bag-of-features image classi cation</article-title>
          .
          <source>European Conference on Computer Vision</source>
          . pages
          <fpage>490</fpage>
          {
          <fpage>503</fpage>
          .
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Hare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Samangooei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Nixon</surname>
          </string-name>
          .
          <article-title>Semantic spaces revisited: investigating the performance of auto-annotation and semantic retrieval using semantic spaces</article-title>
          .
          <source>In CIVR '08: Proceedings of the 2008 international conference on Content-based image and video retrieval</source>
          , pages
          <volume>359</volume>
          {
          <fpage>368</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>C. Boutsidis</surname>
          </string-name>
          , E. Gallopoulos,
          <article-title>SVD based initialization: A head start for nonnegative matrix factorization</article-title>
          ,
          <source>Pattern Recognition</source>
          , Volume
          <volume>41</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <string-name>
            <surname>April</surname>
            <given-names>2008</given-names>
          </string-name>
          , Pages
          <fpage>1350</fpage>
          -
          <lpage>1362</lpage>
          , ISSN 0031-3203.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Paul Pauca, Farial Shahnaz,
          <article-title>Michael Berry and Robert Plemmons: Text Mining using Nonnegative Matrix Factorizations</article-title>
          ,
          <source>Procceedings SIAM Inter. Conference on Data Mining</source>
          , Orlando,
          <year>April 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Seung</surname>
          </string-name>
          .
          <article-title>Algorithms for nonnegative matrix factorization</article-title>
          .
          <source>Advances in Neural Information Processing Systems</source>
          ,
          <volume>13</volume>
          :
          <fpage>556</fpage>
          {
          <fpage>562</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>